arXiv:2604.25765v1 Announce Type: cross Abstract: The quality of training data is critical to the performance of machine learning models. In this paper, the Error Sensitivity Profile (ESP) is proposed. It quantifies the sensitivity of model performance to errors in a single feature or in multiple features. By leveraging ESP, data-cleaning efforts can be prioritized based […]
On Halting vs Converging in Recurrent Graph Neural Networks
arXiv:2604.25551v1 Announce Type: cross Abstract: Recurrent Graph Neural Networks (RGNNs) extend standard GNNs by iterating message-passing until some stopping condition is met. Various RGNN models have been proposed in the literature. In this paper, we study three such models: converging RGNNs, where all vertex representations must stabilise; output-converging RGNNs, where only the output classifications must […]
Evaluating Plan Compliance in Autonomous Programming Agents
arXiv:2604.12147v2 Announce Type: replace-cross Abstract: Agents aspire to eliminate the need for task-specific prompt crafting through autonomous reason-act-observe loops. Still, they are commonly instructed to follow a task-specific plan for guidance, e.g., to resolve software issues following phases for navigation, reproduction, patch, and validation. Unfortunately, it is unknown to what extent agents actually follow such […]
LinuxArena: A Control Setting for AI Agents in Live Production Software Environments
arXiv:2604.15384v2 Announce Type: replace-cross Abstract: We introduce LinuxArena, a control setting in which agents operate directly on live, multi-service production environments. LinuxArena contains 20 environments, 1,671 main tasks representing legitimate software engineering work, and 184 side tasks representing safety failures such as data exfiltration and backdooring, making it the largest and most diverse control setting […]
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
arXiv:2604.25891v1 Announce Type: cross Abstract: Finetuning a language model can lead to emergent misalignment (EM) [Betley et al., 2025b]. Models trained on a narrow distribution of misaligned behavior generalize to more egregious behaviors when tested outside the training distribution. We study a set of interventions proposed to reduce EM. We confirm that these interventions reduce […]
Medoid Prototype Alignment for Cross-Plant Unknown Attack Detection in Industrial Control Systems
arXiv:2604.25544v1 Announce Type: cross Abstract: Deploying an intrusion detector trained in one industrial plant to another remains difficult because Industrial Control System (ICS) traffic is highly site-dependent, labels are scarce, and unseen attacks often appear after deployment. To address this challenge, this paper introduces a medoid prototype alignment framework for cross-plant unknown attack detection. Instead […]
AInstein: Can LLMs Solve Research Problems From Parametric Memory Alone?
arXiv:2510.05432v2 Announce Type: replace Abstract: Can large language models solve AI research problems using only their parametric knowledge, without fine-tuning, retrieval, or other external aids? We introduce AInstein, a framework for testing whether LLM agents can generate and refine solutions to research problems through iterative critique loops. A blind study with 20 domain experts on […]
BiCon-Gate: Consistency-Gated De-colloquialisation for Dialogue Fact-Checking
arXiv:2604.14389v2 Announce Type: replace-cross Abstract: Automated fact-checking in dialogue involves multi-turn conversations where colloquial language is frequent yet understudied. To address this gap, we propose a conservative rewrite candidate for each response claim via staged de-colloquialisation, combining lightweight surface normalisation with scoped in-claim coreference resolution. We then introduce BiCon-Gate, a semantics-aware consistency gate that selects […]
SciDER: Scientific Data-centric End-to-end Researcher
arXiv:2603.01421v2 Announce Type: replace Abstract: Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We introduce SciDER, a data-centric end-to-end system that automates the research lifecycle. Unlike traditional frameworks, our specialized agents collaboratively parse and […]
The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation
arXiv:2604.25530v1 Announce Type: cross Abstract: Recent knowledge distillation (KD) methods for semantic segmentation introduce increasingly complex hand-crafted objectives, yet are typically evaluated under fixed iteration schedules. These objectives substantially increase per-iteration cost, meaning equal iteration counts do not correspond to equal training budgets. It is therefore unclear whether reported gains reflect stronger distillation signals or […]
GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification
arXiv:2604.14258v2 Announce Type: replace Abstract: Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a special case of policy gradient optimization with an extremely […]
Actively induced supercoiling can slow down plasmid solutions by trapping the threading entanglements
arXiv:2506.22842v2 Announce Type: replace-cross Abstract: Harnessing the topology of ring polymers as a design motif in functional nanomaterials is becoming a promising direction in the field of soft matter. For example, the ring topology of DNA plasmids prevents the relaxation of excess twist introduced to the polymer, instead resulting in helical supercoiled structures. In equilibrium […]