• Home
  • Uncategorized
  • MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models

arXiv:2412.20718v2 Announce Type: replace-cross
Abstract: The rapid integration of Large Vision-Language Models (LVLMs) into critical domains necessitates comprehensive moral evaluation to ensure their alignment with human values. While extensive research has addressed moral evaluation in LLMs, text-centric assessments cannot adequately capture the complex contextual nuances and ambiguities introduced by visual modalities. To bridge this gap, we introduce MM-MoralBench, a multimodal moral evaluation benchmark grounded in Moral Foundations Theory. We construct unique multimodal scenarios by combining synthesized visual contexts with character dialogues to simulate real-world dilemmas where visual and linguistic information interact dynamically. Our benchmark assesses models across six moral foundations through moral judgment, classification, and response tasks. Extensive evaluations of over 20 LVLMs reveal that models exhibit pronounced moral alignment bias, diverging significantly from human consensus. Furthermore, our analysis indicates that general scaling or structural improvements yield diminishing returns in moral alignment, and thinking paradigm may trigger overthinking-induced failures in moral contexts, highlighting the necessity for targeted moral alignment strategies. Our benchmark is publicly available.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844