Ablation Study of Multimodal Perception, Language Grounding, and Control for Human-Robot Interaction in an Object Detection and Grasping Task

Sparse Representation Learning for Vessels

arXiv:2605.01382v1 Announce Type: cross Abstract: Analyzing human vasculature and vessel-like, tubular structures, such as airways, is crucial for disease diagnosis and treatment. Current methods often

LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference

arXiv:2605.01058v1 Announce Type: cross Abstract: Layer-aligned distillation and convergence-based early exit represent two predominant computational efficiency paradigms for transformer inference; yet we establish that they

A Target-Free Harmonization Method for MRI

arXiv:2605.01282v1 Announce Type: cross Abstract: In MRI, variations in scan parameters, sequence, or hardware can lead to discrepancies in image appearance, even for the same

The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate

arXiv:2605.00914v1 Announce Type: cross Abstract: Multi-agent debate, where teams of LLMs iteratively exchange rationales and vote on answers, is widely deployed under the assumption that

GEASS: Training-Free Caption Steering for Hallucination Mitigation in Vision-Language Models

arXiv:2605.01733v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) excel at grounded reasoning but remain prone to object hallucination. Recent work treats self-generated captions as a