arXiv:2605.17480v2 Announce Type: replace Abstract: Multi-agent systems extend large language models (LLMs) by decomposing tasks among specialized agents, but their distributed decision process creates new attack surfaces. We identify semantic hijacking, an attack in which harmful requests are concealed within domain-specific narratives and propagated to a Manager through Worker reports, without any syntactic injection primitives. […]
Agent Security is a Systems Problem
arXiv:2605.18991v1 Announce Type: cross Abstract: We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted component, and security invariants must be enforced at the system level. Through this lens, efforts to increase model robustness (the dominant viewpoint in the […]
Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses
arXiv:2605.19229v1 Announce Type: new Abstract: Survey research faces mounting structural challenges: declining response rates, sample bias, block-wise missingness among at-risk respondents, and AI-assisted fraudulent completions in online panels. Large language models (LLMs) have been proposed as a remedy, yet rigorous evaluations across the full survey workflow remain scarce, particularly in disaster contexts where data quality […]
KVBuffer: IO-aware Serving for Linear Attention
arXiv:2605.19049v1 Announce Type: cross Abstract: Linear attention has recently gained significant attention for long-context inference due to its constant decoding cost with respect to context length. However, existing serving systems typically serve linear attention by recurrently computing and updating a large linear attention state in every decoding step. Since the state is much larger than […]
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning
arXiv:2507.15698v2 Announce Type: replace-cross Abstract: Process Reward Models (PRMs) play a central role in evaluating and guiding multi-step reasoning in large language models (LLMs), especially for mathematical problem solving. However, we identify a pervasive length bias in existing PRMs: they tend to assign higher scores to longer reasoning steps, even when the semantic content and […]
Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German
arXiv:2605.19069v1 Announce Type: cross Abstract: Code-switching — the natural alternation between two languages within a single utterance — represents one of the most challenging and under-studied conditions for automatic speech recognition (ASR). Existing commercial ASR benchmarks predominantly evaluate clean, monolingual audio and report a single Word Error Rate (WER) figure that tells practitioners little about […]
Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination
arXiv:2605.19250v1 Announce Type: new Abstract: Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To understand why visual evidence fails to prevail during generation, we take a mechanistic perspective and examine which internal components drive or resist this failure. We perform head-level causal analysis using path patching […]
ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking
arXiv:2605.19077v1 Announce Type: cross Abstract: Task-oriented dialogue systems — handling transactions, reservations, and service requests — require predictable behavior, yet the moderately-sized LLMs needed for practical latency are prone to hallucination and format errors that cascade into incorrect actions (e.g., a hotel booked for the wrong date). We propose ReacTOD, a bounded neuro-symbolic architecture that […]
Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models
arXiv:2511.10292v2 Announce Type: replace-cross Abstract: Large Vision-Language Models (LVLMs) typically process visual inputs as a prefix to the language decoder. As the model autoregressively generates text, this initial visual information inevitably undergoes “dilution” leading the model to over-rely on language priors and hallucinate objects. Existing interventions attempt to correct this by contrasting logits or iteratively […]
Neural Operators for Design-Space Surrogate Modeling of Tendon-Actuated Continuum Robots
arXiv:2605.19104v1 Announce Type: cross Abstract: Continuum robots enable dexterous manipulation in constrained environments, but require accurate and efficient models for real-time manipulation and control. Traditional physics-based models can be computationally expensive and may suffer from inaccuracies due to unmodeled effects, while current learning-based methods often generalize poorly beyond the specific robot on which they are […]
KappaPlace: Learning Hyperspherical Uncertainty for Visual Place Recognition via Prototype-Anchored Supervision
arXiv:2605.19435v1 Announce Type: cross Abstract: Visual Place Recognition (VPR) is critical for autonomous navigation, yet state-of-the-art methods lack well-calibrated uncertainty estimation. Standard pipelines cannot reliably signal when a query is ambiguous or a match is likely incorrect, posing risks in safety-critical robotics. We propose KappaPlace, a principled framework for learning uncertainty-aware VPR representations. Our core […]
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
arXiv:2605.19619v1 Announce Type: cross Abstract: Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algorithms. Although some works have begun to study convergence properties (i.e., optimization error) of […]