arXiv:2605.21082v1 Announce Type: new Abstract: Large Language Model (LLM) based agents have demonstrated proficiency in multi-step interactions with graphical user interfaces (GUIs). While most research focuses on improving single-task performance, practical scenarios often involve repetitive GUI tasks for which invoking LLM reasoning repeatedly, i.e., the ReAct paradigm, is inefficient. Prior to LLMs, traditional Robotic Process […]
DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU
arXiv:2605.20936v1 Announce Type: cross Abstract: Hybrid attention architectures are becoming an increasingly important paradigm for improving LLM inference efficiency while preserving model quality, making hybrid architecture design a central problem. Existing designs often rely on manual empirical rules or proxy-based selector signals for layer-wise operator allocation. Recent NAS-style systems such as Jet-Nemotron demonstrate the promise […]
PBT-Bench: Benchmarking AI Agents on Property-Based Testing
arXiv:2605.15229v2 Announce Type: replace-cross Abstract: Existing code benchmarks measure whether an agent can produce any test that reproduces a known bug, or whether it can produce a patch that fixes a described issue. Neither isolates the distinct skill of property-based testing: deriving a semantic invariant from documentation, and then constructing an input-generation strategy precise enough […]
Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory
arXiv:2605.20982v1 Announce Type: cross Abstract: AlltoAll dispatch is the dominant bottleneck of MoE expert parallelism, and the interconnect community has responded with four families of mitigations: predictive sample placement, adaptive expert relayout, hierarchical collectives, and EP-aware topology. All four rest on two assumptions about the workload. The first is that routing imbalance is correctable by […]
ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving
arXiv:2605.21168v1 Announce Type: new Abstract: Safety-critical scenarios are central to evaluating autonomous driving systems, yet their rarity in naturalistic logs makes simulation-based stress testing indispensable. Most scenario generation methods treat surrounding agents as adversaries, but they either (i) induce failures without explicitly modeling vehicle-road physical limits, yielding visually extreme yet physically unsolvable crashes, or (ii) […]
Divide et Calibra: Multiclass Local Calibration via Vector Quantization
arXiv:2605.21060v1 Announce Type: cross Abstract: Accurate and well-calibrated Machine Learning (ML) models are mandatory in high-stakes settings, yet effective multiclass calibration remains challenging: global approaches assume calibration errors are homogeneous across the latent space, while local methods often rely on latent-space dimensionality reduction, which leads to information loss. To address these issues, we propose a […]
Stimulus symmetries can confound representational similarity analyses
arXiv:2605.21324v1 Announce Type: new Abstract: What can representational similarity matrices (RSMs) tell us about a neural code? As the popularity of these summary statistics grows, so too does the need for a more complete characterization of their properties. Here, we show that symmetries in network inputs can confound RSM-based analyses. Stimulus symmetries render many representations […]
ACL-Verbatim: hallucination-free question answering for research
arXiv:2605.21102v1 Announce Type: cross Abstract: Academic researchers need efficient and reliable methods for collecting high-quality information from trusted sources, but modern tools for AI-assisted research still suffer from the tendency of Large Language Models (LLMs) to produce factually inaccurate or nonsensical output, commonly referred to as hallucinations. We apply the extractive question answering system VerbatimRAG […]
Artificial Intelligence Reshapes Microwave Photonics
arXiv:2605.21224v1 Announce Type: cross Abstract: As a rapidly emerging interdisciplinary field that intrinsically integrates microwave and photonics, microwave photonics (MWP) provides disruptive solutions to overcome the fundamental bandwidth of conventional electronic systems. By exploiting the inherently ultra-wide bandwidth and low-loss characteristics of photonic technologies, MWP enables the generation, transmission, processing, and detection of microwave, millimeter-wave, […]
Multimodal Optimal Transport for Training-free Temporal Segmentation in Surgical Robotics
arXiv:2602.24138v2 Announce Type: replace-cross Abstract: Automated recognition of surgical phases and steps is a fundamental capability for intraoperative decision support, workflow automation, and skill assessment in robotic-assisted surgery. Existing approaches either depend on large-scale annotated surgical datasets or require expensive domain-specific pretraining on thousands of labeled videos, limiting their practical deployability across diverse robotic platforms […]
APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation
arXiv:2605.09492v2 Announce Type: replace-cross Abstract: Large language models (LLMs) often suffer from hallucinations due to error accumulation in autoregressive decoding, where suboptimal early token choices misguide subsequent generation. Although multi-path decoding can improve robustness by exploring alternative trajectories, existing methods lack principled strategies for determining when to branch and how to regulate inter-path interactions. We […]
Mechanisms of Misgeneralization in Physical Sequence Modeling
arXiv:2605.20299v1 Announce Type: cross Abstract: Generative sequence models are often trained to plan motion in physical domains, from robotics to mechanical simulations. When constructing a dataset to train such a model, engineers may curate demonstrations to specify how trajectories should be distributed over a physical quantity like travel distance or mechanical energy. For example, a […]