arXiv:2605.15736v1 Announce Type: cross
Abstract: Biomedical Vision–Language Models (VLMs) have shown remarkable promise in few-shot medical diagnosis but face a critical bottleneck: textitfragility to prompt variations.Existing adaptation frameworks typically optimize visual and textual prompts as independent streams, relying on ideal “Golden Prompts”. In clinical reality, where descriptions are often noisy and heterogeneous, this modality isolation leads to unstable cross-modal alignment.
To address this, we propose BiomedAP, a vision-informed dual-anchor framework with gated cross-modal fusion.BiomedAP enforces synergistic alignment through two mechanisms: (1) Gated Cross-Modal Fusion, which enables layer-wise interaction between modalities, acting as a dynamic noise regulator to suppress irrelevant textual cues; and (2) a Dual-Anchor Constraint that regularizes learnable prompts toward stable semantic centroids derived from both expert templates (High Anchors) and few-shot visual prototypes (Low Anchors).
Extensive experiments across 11 benchmarks demonstrate that BiomedAP consistently surpasses baselines, achieving competitive few-shot accuracy and markedly enhanced robustness under prompt perturbations.
Our code is available at: https://github.com/tongdiedie/BiomedAP.
Keywords: Vision-Language Models; Prompt Learning; Parameter-Efficient Fine-Tuning; Few-shot Learning
ExECG: An Explainable AI Framework for ECG models
arXiv:2605.19258v1 Announce Type: cross Abstract: Deep learning has enabled ECG diagnostic models with strong performance in tasks such as arrhythmia classification and abnormality detection. However,


