Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection

arXiv:2605.27155v1 Announce Type: cross Abstract: Testing object detectors in safety-critical domains requires semantically meaningful probes beyond pixel-level corruptions. We present SemProbe, a tool for semantic

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

arXiv:2605.26895v1 Announce Type: cross Abstract: Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

arXiv:2605.27016v1 Announce Type: cross Abstract: Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment.

The Labyrinth and the Thread: Rethinking Regularizations in Sequential Knowledge Editing for Large Language Models

arXiv:2605.26670v1 Announce Type: cross Abstract: Sequential editing of structured knowledge in large language models allows targeted factual updates without retraining, yet existing methods often rely

Generative artificial intelligence and the marginalization of minoritized knowledges in higher education: the case of disability

arXiv:2605.26769v1 Announce Type: cross Abstract: Generative artificial intelligence redefines higher education by restructuring the processes through which scientific knowledge is produced and validated. These systems

Advancing Creative Physical Intelligence in Large Multimodal Models

May 27, 2026

arXiv:2605.26396v1 Announce Type: new
Abstract: Large multimodal models (LMMs) have rapidly advanced in perception and reasoning; however, it remains unclear whether these capabilities generalize to discovering visually grounded solutions in open-ended environments, beyond pattern recognition. In such settings, intelligence requires more than answering well-posed questions: it involves identifying how elements in a scene can be repurposed in non-obvious yet physically feasible ways. This form of creative problem-solving is central to human intelligence, but remains largely untested in current benchmarks. To evaluate this ability, we introduce MM-CreativityBench, a benchmark for affordance-grounded creative tool use in visually rich, physically constrained environments. Each instance presents a scenario image with structured views of candidate entities and their parts, enabling fine-grained, interactive evaluation of how models iteratively inspect the scene, identify relevant affordances, and compose visually and physically grounded solutions. Our experiments show that current LMMs often fall short, not due to lack of generative capability, but because they do not sustain grounded exploration. Models often overlook relevant entities, under-examine critical parts, or hallucinate attributes not grounded in the image. Motivated by this failure mode, we propose affordance-grounded alignment, which casts creative tool use as a preference learning problem. Using Direct Preference Optimization, we encourage models to prefer attribute-affordance reasoning grounded in visual evidence over hallucinated alternatives. In addition, we incorporate supervision derived from an affordance knowledge base to guide broader entity exploration and multi-turn planning. Our results show consistent gains in selecting the correct entities and parts, while substantially reducing hallucination and grounding-related errors.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844