arXiv:2604.08600v1 Announce Type: new
Abstract: Existing deep learning methods for radiology report generation enhance diagnostic efficiency but often overlook physician-informed medical priors. This leads to a suboptimal alignment between the structured explanations and disease manifestations. Eye gaze data provides critical insights into a radiologist’s visual attention, enhancing the relevance and interpretability of extracted features while aligning with human decision-making processes. However, despite its promising potential, the integration of eye gaze information into AI-driven medical imaging workflows is impeded by challenges such as the complexity of multimodal data fusion and the high cost of gaze acquisition, particularly its absence during inference, limiting its practical applicability in real-world clinical settings. To address these issues, we introduce Gaze2Report, a framework which leverages a scanpath prediction module and Graph Neural Network (GNN) to generate joint visual-gaze tokens. Combined with instruction and report tokens, these form a multimodal prompt used to fine-tune LoRA layers of large language models (LLMs) for autoregressive report generation. Gaze2Report enhances report quality through eye-gaze-guided visual learning and incorporates on-the-fly scanpath prediction, enabling the model to operate without gaze input during inference.
Adaptation to free-living drives loss of beneficial endosymbiosis through metabolic trade-offs
Symbioses are widespread (1) and underpin the function of diverse ecosystems (2-6), but their evolutionary stability is challenging to explain (7,8). Fitness trade-offs between con-trasting


