arXiv:2604.11641v3 Announce Type: replace-cross Abstract: Code agents are advancing rapidly, but debugging them is becoming increasingly difficult. As frameworks orchestrate parallel tool calls and multi-stage workflows over complex tasks, making the agent’s state transitions and error propagation hard to observe. In these runs, an early misstep can trap the agent in unproductive loops or even […]
HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark
arXiv:2604.13954v1 Announce Type: cross Abstract: Existing agent-safety evaluation has focused mainly on externally induced risks. Yet agents may still enter unsafe trajectories under benign conditions. We study this complementary but underexplored setting through the lens of emphintrinsic risk, where intrinsic failures remain latent, propagate across long-horizon execution, and eventually lead to high-consequence outcomes. To evaluate […]
Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation
arXiv:2604.13956v1 Announce Type: cross Abstract: Text-to-image (T2I) systems enable rapid generation of high-fidelity imagery but are misaligned with how visual ideas develop. T2I systems generate outputs that make implicit visual decisions on behalf of the user, often introduce fine-grained details that can anchor users prematurely and limit their ability to keep options open early on, […]
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
arXiv:2604.13016v2 Announce Type: replace-cross Abstract: On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds or fails: (i) the student and teacher […]
Baseline glycemia exhibits non-random, history-dependent variation across repeated meals
arXiv:2604.13141v1 Announce Type: new Abstract: Glycemic regulation is often described as maintaining glucose levels near a stable baseline. However, continuous glucose monitoring after meals displays intra-individual variability even under controlled conditions, suggesting intrinsic system dynamics beyond sensor noise, measurement error or short-term variability around a fixed set point. Therefore, we estimated pre-meal glucose baselines, tracking […]
Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation
arXiv:2604.13088v1 Announce Type: cross Abstract: In sparse termination rewards, intra-group comparisons have become the dominant paradigm for fine-tuning reasoning models via reinforcement learning. However, long-term training often leads to issues like ineffective update accumulation (learning tax), solution probability drift, and entropy collapse. This paper presents a necessary condition for algorithm design from a token-level credit […]
ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection
arXiv:2604.13924v1 Announce Type: cross Abstract: Time-series anomaly detection (TSAD) is critical in domains such as industrial monitoring, healthcare, and cybersecurity, but it remains challenging due to rare and heterogeneous anomalies and the scarcity of labelled data. This scarcity makes unsupervised approaches predominant, yet existing methods often rely on reconstruction or forecasting, which struggle with complex […]
CCCE: A Continuous Code Calibration Engine for Autonomous Enterprise Codebase Maintenance via Knowledge Graph Traversal and Adaptive Decision Gating
arXiv:2604.13102v1 Announce Type: cross Abstract: Enterprise software organizations face an escalating challenge in maintaining the integrity, security, and freshness of codebases that span hundreds of repositories, multiple programming languages, and thousands of interdependent packages. Existing approaches to codebase maintenance — including static analysis, software composition analysis (SCA), and dependency management tools — operate in isolation, […]
Optimal Stability of KL Divergence under Gaussian Perturbations
arXiv:2604.11026v2 Announce Type: replace-cross Abstract: We study the problem of characterizing the stability of Kullback-Leibler (KL) divergence under Gaussian perturbations beyond Gaussian families. Existing relaxed triangle inequalities for KL divergence critically rely on the assumption that all involved distributions are Gaussian, which limits their applicability in modern applications such as out-of-distribution (OOD) detection with flow-based […]
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration
arXiv:2604.14116v1 Announce Type: new Abstract: While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, such as LLM training, remains a significant challenge. In this paper, we introduce TREX, a multi-agent system that automates the entire LLM training life-cycle. By orchestrating collaboration between two core modules-the […]
Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection
arXiv:2604.13899v1 Announce Type: cross Abstract: Instruction-tuned LLMs can annotate thousands of instances from a short prompt at negligible cost. This raises two questions for active learning (AL): can LLM labels replace human labels within the AL loop, and does AL remain necessary when entire corpora can be labelled at once? We investigate both questions on […]
A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project
arXiv:2604.13042v1 Announce Type: cross Abstract: Semantic data harmonisation is a central requirement in the ILIAD project, where heterogeneous environmental data must be harmonised according to the Ocean Information Model (OIM), a modular family of ontologies for enabling the implementation of interoperable Digital Twins of the Ocean. Existing approaches to Semantic Data Harmonisation, such as RML […]