• Home
  • AI/ML & Advanced Analytics
  • BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation

arXiv:2605.15736v1 Announce Type: cross
Abstract: Biomedical Vision–Language Models (VLMs) have shown remarkable promise in few-shot medical diagnosis but face a critical bottleneck: textitfragility to prompt variations.Existing adaptation frameworks typically optimize visual and textual prompts as independent streams, relying on ideal “Golden Prompts”. In clinical reality, where descriptions are often noisy and heterogeneous, this modality isolation leads to unstable cross-modal alignment.
To address this, we propose BiomedAP, a vision-informed dual-anchor framework with gated cross-modal fusion.BiomedAP enforces synergistic alignment through two mechanisms: (1) Gated Cross-Modal Fusion, which enables layer-wise interaction between modalities, acting as a dynamic noise regulator to suppress irrelevant textual cues; and (2) a Dual-Anchor Constraint that regularizes learnable prompts toward stable semantic centroids derived from both expert templates (High Anchors) and few-shot visual prototypes (Low Anchors).
Extensive experiments across 11 benchmarks demonstrate that BiomedAP consistently surpasses baselines, achieving competitive few-shot accuracy and markedly enhanced robustness under prompt perturbations.
Our code is available at: https://github.com/tongdiedie/BiomedAP.
Keywords: Vision-Language Models; Prompt Learning; Parameter-Efficient Fine-Tuning; Few-shot Learning

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844