arXiv:2604.13759v1 Announce Type: new Abstract: Large language model (LLM) agents on multi-step tasks suffer reasoning degradation, looping, drift, stuck states, at rates up to 30% on hard tasks. Current solutions include hard step limits (abrupt) or LLM-as-judge monitoring (10-15% overhead per step). This paper introduces the Cognitive Companion, a parallel monitoring architecture with two implementations: […]
MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis
arXiv:2604.13432v1 Announce Type: cross Abstract: Token compression is crucial for mitigating the quadratic complexity of self-attention mechanisms in Vision Transformers (ViTs), which often involve numerous input tokens. Existing methods, such as ToMe, rely on GPU-inefficient operations (e.g., sorting, scattered writes), introducing overheads that limit their effectiveness. We introduce MaMe, a training-free, differentiable token merging method […]
IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
arXiv:2604.07709v3 Announce Type: replace Abstract: Ask a frontier model how to taper six milligrams of alprazolam (psychiatrist retired, ten days of pills left, abrupt cessation causes seizures) and it tells her to call the psychiatrist she just explained does not exist. Change one word (“I’m a psychiatrist; a patient presents with…”) and the same model, […]
Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment
arXiv:2604.13462v1 Announce Type: cross Abstract: Effective IT change management is important for businesses that depend on software and services, particularly in highly regulated sectors such as finance, where operational reliability, auditability, and explainability are essential. A significant portion of IT incidents are caused by changes, making it important to identify high-risk changes before deployment. This […]
Asymmetric-Loss-Guided Hybrid CNN-BiLSTM-Attention Model for Industrial RUL Prediction with Interpretable Failure Heatmaps
arXiv:2604.13459v1 Announce Type: cross Abstract: Turbofan engine degradation under sustained operational stress necessitates robust prognostic systems capable of accurately estimating the Remaining Useful Life (RUL) of critical components. Existing deep learning approaches frequently fail to simultaneously capture multi-sensor spatial correlations and long-range temporal dependencies, while standard symmetric loss functions inadequately penalize the safety-critical error of […]
AlphaCNOT: Learning CNOT Minimization with Model-Based Planning
arXiv:2604.13812v1 Announce Type: new Abstract: Quantum circuit optimization is a central task in Quantum Computing, as current Noisy Intermediate Scale Quantum devices suffer from error propagation that often scales with the number of operations. Among quantum operations, the CNOT gate is of fundamental importance, being the only 2-qubit gate in the universal Clifford+T set. The […]
Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus
arXiv:2604.13472v1 Announce Type: cross Abstract: Cooperative multi-agent reinforcement learning (MARL) is widely used to address large joint observation and action spaces by decomposing a centralized control problem into multiple interacting agents. However, such decomposition often introduces additional challenges, including non-stationarity, unstable training, weak coordination, and limited theoretical guarantees. In this paper, we propose the Consensus […]
Visual Sparse Steering (VS2): Unsupervised Adaptation for Image Classification using Sparsity-Guided Steering Vectors
arXiv:2506.01247v2 Announce Type: replace-cross Abstract: Steering vision foundation models at test time, without updating foundation-model weights or using labeled target data, is a desirable yet challenging goal. We present Visual Sparse Steering (VS2), a lightweight, label-free adaptation method that constructs a steering vector from sparse features extracted by a Sparse Autoencoder (SAE) trained on unlabeled […]
From Alignment to Prediction: A Study of Self-Supervised Learning and Predictive Representation Learning
arXiv:2604.13518v1 Announce Type: cross Abstract: Self-supervised learning has emerged as a major technique for the task of learning from unlabeled data, where the current methods mostly revolve around alignment of representations and input recon struction. Although such approaches have demonstrated excellent performance in practice, their scope remains mostly confined to learning from observed data and […]
SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization
arXiv:2604.13515v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO) is a common post-training recipe. We conduct a controlled ablation over SFT-GRPO data overlap, evaluating Qwen3-8B (thinking disabled) post-trained for Lean 4 autoformalization under six conditions that differ solely in training recipe: a base model, SFT-only, GRPO-only, and three SFT+GRPO […]
GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis
arXiv:2604.13888v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift toward autonomous spatial analysis. However, evaluating these LLM-based agents remains challenging due to the complex, multi-step nature of geospatial workflows. Existing benchmarks primarily rely on static text or code matching, neglecting dynamic runtime feedback […]
Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
arXiv:2604.13540v1 Announce Type: cross Abstract: Unified Multimodal Models (UMMs) aim to integrate visual understanding and generation within a single structure. However, these models exhibit a notable capability mismatch, where their understanding capability significantly outperforms their generation. This mismatch indicates that the model’s rich internal knowledge, while effective for understanding tasks, remains underactivated during generation. To […]