arXiv:2604.05007v1 Announce Type: cross
Abstract: In Audio-Visual Navigation (AVN), agents must locate sound sources in unseen 3D environments using visual and auditory cues. However, existing methods often struggle with generalization in unseen scenarios, as they tend to overfit to semantic sound features and specific training environments. To address these challenges, we propose the textbfBinaural Difference Attention with Action Transition Prediction (BDATP) framework, which jointly optimizes perception and policy. Specifically, the textbfBinaural Difference Attention (BDA) module explicitly models interaural differences to enhance spatial orientation, reducing reliance on semantic categories. Simultaneously, the textbfAction Transition Prediction (ATP) task introduces an auxiliary action prediction objective as a regularization term, mitigating environment-specific overfitting. Extensive experiments on the Replica and Matterport3D datasets demonstrate that BDATP can be seamlessly integrated into various mainstream baselines, yielding consistent and significant performance gains. Notably, our framework achieves state-of-the-art Success Rates across most settings, with a remarkable absolute improvement of up to 21.6 percentage points in Replica dataset for unheard sounds. These results underscore BDATP’s superior generalization capability and its robustness across diverse navigation architectures.
This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA
arXiv:2604.05051v1 Announce Type: cross Abstract: Patients are increasingly turning to large language models (LLMs) with medical questions that are complex and difficult to articulate clearly.



