arXiv:2603.27982v1 Announce Type: cross
Abstract: Vision-language models (VLMs) achieve strong performance on many benchmarks, yet a basic reliability question remains underexplored: when visual evidence conflicts with commonsense, do models follow what is shown or what commonsense suggests? A characteristic failure in this setting is that the model overrides visual evidence and outputs the commonsense alternative. We term this phenomenon textbfcommonsense-driven hallucination (CDH). To evaluate it, we introduce textbfCDH-Bench, a benchmark designed to create explicit textbfvisual evidence–commonsense conflicts. CDH-Bench covers three dimensions: textitcounting anomalies, textitrelational anomalies, and textitattribute anomalies. We evaluate frontier VLMs under textitbinary Question Answering (QA) and textitmultiple-choice QA, and report metrics including textitCounterfactual Accuracy (CF-Acc), textitCommonsense Accuracy (CS-Acc), textitCounterfactual Accuracy Drop (CFAD), textitCommonsense Collapse Rate (CCR), and textitRelative Prior Dependency (RPD). Results show that even strong models remain vulnerable to prior-driven normalization under visual evidence–commonsense conflict. CDH-Bench provides a controlled diagnostic of visual fidelity under visual evidence–commonsense conflict.
Assessing nurses’ attitudes toward artificial intelligence in Kazakhstan: psychometric validation of a nine-item scale
BackgroundArtificial intelligence (AI) is increasingly integrated into healthcare, yet the attitudes and knowledge of nurses, who are the key mediators of AI implementation, remain underexplored.


