• Home
  • Uncategorized
  • CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models

arXiv:2603.27982v1 Announce Type: cross
Abstract: Vision-language models (VLMs) achieve strong performance on many benchmarks, yet a basic reliability question remains underexplored: when visual evidence conflicts with commonsense, do models follow what is shown or what commonsense suggests? A characteristic failure in this setting is that the model overrides visual evidence and outputs the commonsense alternative. We term this phenomenon textbfcommonsense-driven hallucination (CDH). To evaluate it, we introduce textbfCDH-Bench, a benchmark designed to create explicit textbfvisual evidence–commonsense conflicts. CDH-Bench covers three dimensions: textitcounting anomalies, textitrelational anomalies, and textitattribute anomalies. We evaluate frontier VLMs under textitbinary Question Answering (QA) and textitmultiple-choice QA, and report metrics including textitCounterfactual Accuracy (CF-Acc), textitCommonsense Accuracy (CS-Acc), textitCounterfactual Accuracy Drop (CFAD), textitCommonsense Collapse Rate (CCR), and textitRelative Prior Dependency (RPD). Results show that even strong models remain vulnerable to prior-driven normalization under visual evidence–commonsense conflict. CDH-Bench provides a controlled diagnostic of visual fidelity under visual evidence–commonsense conflict.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844