arXiv:2605.19250v1 Announce Type: new
Abstract: Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To understand why visual evidence fails to prevail during generation, we take a mechanistic perspective and examine which internal components drive or resist this failure. We perform head-level causal analysis using path patching across five open-source MLLMs and identify two groups of attention heads with opposing causal roles: hallucination-driving heads and hallucination-resisting heads. We find a consistent asymmetry: driving effects are more broadly distributed and carry greater aggregate weight, whereas resisting effects concentrate in a small number of high-importance heads. Ablation experiments further confirm that these groups exert opposing effects during generation: distributed driving influence and localized resistance together form an imbalanced routing structure that biases generation toward the erroneous premise. Motivated by this finding, we propose MACI (Modality-conflict-Aware Causal Intervention), a conditional intervention that suppresses causally identified hallucination-driving heads only when conflict is detected. Across five MLLMs, MACI achieves the largest hallucination reduction among compared inference-time baselines on the MMMC benchmark with a favorable hallucination-accuracy trade-off, and transfers zero-shot to the SCI-SemanticConflict test.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844