arXiv:2605.27697v1 Announce Type: cross Abstract: Decentralized multi-robot motion planning requires each robot to generate collision-free trajectories from local observations, without global sensing or reliable communication. However, most existing planners, whether classical or learning-based, generate trajectories from a static snapshot of the local observation, which limits their ability to anticipate the future behavior of neighboring robots. […]
ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions
arXiv:2605.27819v1 Announce Type: cross Abstract: Sparse autoencoders are usually trained one layer at a time, even though transformer residual stream activations are strongly coupled across depth. This creates a practical problem for multi-layer interventions: different layerwise dictionaries can spend capacity representing the same carried-forward information, and replacing several layers at once can produce interactions that […]
Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability
arXiv:2605.28602v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for tasks that implicitly reduce to Boolean satisfiability (SAT), yet their reasoning ability on SAT remains unclear. We present a systematic study of LLMs on 2-SAT and 3-SAT, together with two canonical reductions, Vertex Cover and discrete 3D packing, to probe representation-invariant reasoning. […]
LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?
arXiv:2605.28721v1 Announce Type: new Abstract: Are LLM-based search agents genuinely searching, or using the web to verify what they already know? We study this question on BrowseComp with three diagnostics. Our analysis reveals Intrinsic Knowledge Dependence (IKD): even with tool access, agents often rely on intrinsic knowledge — information encoded in the model before retrieval […]
Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models
arXiv:2605.27383v1 Announce Type: cross Abstract: Spoken Language Models (SLMs) have emerged as a promising paradigm for speech synthesis by bypassing explicit grapheme-to-phoneme pipelines. However, their effectiveness in low-resource languages remains fundamentally limited by the scarcity of transcribed speech. In practice, synthetic data has become the primary strategy for scaling SLMs in such settings, providing reliable […]
LLM-assisted sentiment analysis for integrated computational and qualitative mixed methods education research: A case study of students’ written reflection assignments
arXiv:2605.27403v1 Announce Type: cross Abstract: Written reflection assignments give students valuable opportunities for critical self-assessment, meaning making, and learning processing. Additionally, such reflections provide rich data for qualitative education research. However, qualitative data can be time-consuming to analyze. It is even more time-intensive to qualitatively compare findings between different groups of participants, usually limiting comparison […]
A Systematic Evaluation of Retrieval-Augmented Generation and Language Models for Space Operations
arXiv:2605.27444v1 Announce Type: cross Abstract: The rapid expansion of space activities has led to an unprecedented accumulation of technical documentation, operational guidelines, and scientific literature, creating challenges for timely decision-making in space operations. Effective management in space operations requires tools capable of efficiently processing vast and heterogeneous information sources. This paper systematically evaluates the performance […]
Debate Helps Weak Judges Reward Stronger Models
arXiv:2605.27483v1 Announce Type: cross Abstract: Despite theoretical promise, debate as a scalable oversight protocol has produced mixed empirical results: gains in some settings, and null effects in others, especially when the judge does not have information hidden from it. We study proposer-critic debate in a stronger-debater/weaker-judge setting on programmatically verifiable code and logic tasks. Debate […]
Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
arXiv:2605.27646v1 Announce Type: cross Abstract: We propose textbfHurwitz Quaternion Multiplicative Quantization (HQMQ), a textbfcalibration-free method for KV cache compression of large language models. HQMQ treats each 4-element chunk of K or V as a quaternion and quantizes its unit direction to the emphproduct $q_p cdot q_s$, where $q_p$ ranges over the 24-element Hurwitz group $2T$ […]
Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought
arXiv:2605.27764v1 Announce Type: cross Abstract: Recent segmentation models couple large language models (LLMs) with mask decoders to ground complex language expressions into masks, yet their instructions remain target-referential: they describe, constrain, or imply the region to be segmented. However, in real-world embodied interaction, human instructions are often at the intent-level, which includes the desired outcome […]
From Detection to Mechanism: Cross-Attention Graph Neural Networks Enable Drug-Drug Interaction Type Prediction An Ablation Study with Acetylsalicylic Acid Validation
arXiv:2605.27861v1 Announce Type: cross Abstract: Predicting whether two drugs interact (binary detection) is a substantially dif- ferent task from predicting the mechanism type of that interaction (multi-class classification). This study presents a systematic ablation study of three Graph Neural Network (GNN) architectures for drug-drug interaction (DDI) prediction on a publicly available benchmark dataset comprising 38,337 […]
Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns
arXiv:2605.28566v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, yet their standard generation process — auto-regressive token prediction — is inherently myopic and prone to cascading errors. To address this, the Tree-of-Thoughts (ToT) framework creates a search space over intermediate reasoning steps, allowing search models to explore, look ahead, and […]