arXiv:2604.10800v1 Announce Type: cross Abstract: Learned classifiers deployed in agentic pipelines face a fundamental reliability problem: predictions are probabilistic inferences, not verified conclusions, and acting on them without grounding in observable evidence leads to compounding failures across downstream stages. Software vulnerability analysis makes this cost concrete and measurable. We address this through a unified cross-language […]
CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation
arXiv:2604.10918v1 Announce Type: new Abstract: Tables contain rich structured information, yet when stored as images their contents remain “locked” within pixels. Converting table images into LaTeX code enables faithful digitization and reuse, but current multimodal large language models (MLLMs) often fail to preserve structural, style, or content fidelity. Conventional post-training with reinforcement learning (RL) typically […]
Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction
arXiv:2604.10786v1 Announce Type: cross Abstract: Narrative understanding requires multidimensional semantic structures. This study investigates whether BERT embeddings encode dimensions of fictional narrative semantics — time, space, causality, and character. Using an LLM to accelerate annotation, we construct a token-level dataset labeled with these four narrative categories plus “others.” A linear probe on BERT embeddings (94% […]
MAFIG: Multi-agent Driven Formal Instruction Generation Framework
arXiv:2604.10989v1 Announce Type: new Abstract: Emergency situations in scheduling systems often trigger local functional failures that undermine system stability and even cause system collapse. Existing methods primarily rely on robust scheduling or reactive scheduling, handling emergencies through predefined rules or rescheduling strategies. However, the diversity and unpredictability of real-world emergencies make them difficult to anticipate, […]
Data Selection for Multi-turn Dialogue Instruction Tuning
arXiv:2604.07892v2 Announce Type: replace-cross Abstract: Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose textbfMDS (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole […]
A Proposed Biomedical Data Policy Framework to Reduce Fragmentation, Improve Quality, and Incentivize Sharing in Indian Healthcare in the era of Artificial Intelligence and Digital Health
arXiv:2604.11125v1 Announce Type: new Abstract: India generates vast biomedical data through postgraduate research, government hospital services and audits, government schemes, private hospitals and their electronic medical record (EMR) systems, insurance programs and standalone clinics. Unfortunately, these resources remain fragmented across institutional silos and vendor-locked EMR systems. The fundamental bottleneck is not technological but economic and […]
Lung Cancer Detection Using Deep Learning
arXiv:2604.10765v1 Announce Type: cross Abstract: Lung cancer, the second leading cause of cancer-related deaths, is primarily linked to long-term tobacco smoking (85% of cases). Surprisingly, 10-15% of cases occur in non-smokers. In 2020, approximately 2 million people were affected globally, resulting in 1.5 million deaths. The survival rate, at around 20%, lags behind other cancers, […]
Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories
arXiv:2604.11365v1 Announce Type: new Abstract: Monte Carlo Tree Search (MCTS) has been widely used for automated reasoning data exploration, but current supervision extraction methods remain inefficient. Standard approaches retain only the single highest-reward trajectory, discarding the comparative signals present in the many explored paths. Here we introduce textbfContrastive Reasoning Path Synthesis (CRPS), a framework that […]
Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation
arXiv:2604.07486v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have emerged as a powerful tool for synthetic data generation. A particularly important use case is producing synthetic replicas of private text, which requires carefully balancing privacy and utility. We propose Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), which uses private seeds and integrates privacy-preserving strategies, […]
SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering
arXiv:2604.11548v1 Announce Type: new Abstract: The rise of OpenClaw in early 2026 marks the moment when millions of users began deploying personal AI agents into their daily lives, delegating tasks ranging from travel planning to multi-step research. This scale of adoption signals that two parallel arcs of development have reached an inflection point. First is […]
Prosociality by Coupling, Not Mere Observation: Homeostatic Sharing in an Inspectable Recurrent Artificial Life Agent
arXiv:2604.10760v1 Announce Type: cross Abstract: Artificial agents can be made to “help” for many reasons, including explicit social reward, hard-coded prosocial bonuses, or direct access to another agent’s internal state. Those possibilities make minimal prosocial behavior hard to interpret. Building on ReCoN-Ipsundrum, an inspectable recurrent controller with affect-coupled regulation, I add an explicit homeostat and […]
From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning
arXiv:2604.10517v1 Announce Type: new Abstract: Modern vision-language models achieve strong performance in static perception, but remain limited in the complex spatiotemporal reasoning required for embodied, egocentric tasks. A major source of failure is their reliance on temporal priors learned from passive video data, which often leads to spatiotemporal hallucinations and poor generalization in dynamic environments. […]