Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection

arXiv:2605.27155v1 Announce Type: cross Abstract: Testing object detectors in safety-critical domains requires semantically meaningful probes beyond pixel-level corruptions. We present SemProbe, a tool for semantic

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

arXiv:2605.27016v1 Announce Type: cross Abstract: Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment.

Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

arXiv:2601.21972v5 Announce Type: replace Abstract: Recent work has explored optimizing LLM collaboration through Multi-Agent Reinforcement Learning (MARL). However, most MARL fine-tuning approaches rely on predefined

A Physics-Informed Hierarchical Neural Network for Microwave Scattering Analysis of 3D PEC Targets

arXiv:2508.03774v5 Announce Type: replace-cross Abstract: Accurate modeling of scattering from three-dimensional (3D) perfectly electrically conducting (PEC) targets at microwave frequencies constitutes a fundamental objective in

Experiments in Agentic AI for Science

arXiv:2605.26305v1 Announce Type: new Abstract: This paper details two novel frameworks for developing autonomous, agentic AI in scientific workflows. Both systems leverage a hybrid Local

Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning

May 22, 2026

arXiv:2605.21988v1 Announce Type: cross
Abstract: Video large language models (Video LLMs) achieve strong benchmark accuracy, yet often answer video questions through shortcuts such as single-frame cues and language priors rather than by tracking spatiotemporal dynamics. This issue is exacerbated in RL post-training, where correctness-only rewards can further reinforce shortcut policies that obtain high reward without tracking video dynamics. We address this by asking a controlled counterfactual question: if the visual world changed while the question remained fixed, should the answer change or stay the same? Based on this view, we propose textbfCounterfactual Relational Policy Optimization (CRPO), a dual-branch RL framework for improving emphspatiotemporal sensitivity. CRPO constructs counterfactual videos through horizontal flips and temporal reversals, trains on both original and counterfactual branches, and introduces a textbfCounterfactual Relation Reward (CRR) between their answers. CRR encourages answers to change for dynamic questions and remain unchanged for static questions. This cross-branch constraint makes it difficult for shortcut policies to be consistently rewarded across both branches. To evaluate this property, we introduce textbfDyBench, a paired counterfactual video benchmark with 3,014 videos covering reversible dynamics, moving direction, and event sequence, together with a strict pair-accuracy metric that prevents fixed-answer shortcuts from inflating scores. Experiments show that CRPO outperforms prior RL methods on spatiotemporal-sensitive evaluations while maintaining competitive general video performance. On Qwen3-VL-8B, CRPO improves DyBench P-Acc by +7.7 and TimeBlind I-Acc by +8.2 over the base model, indicating improved spatiotemporal sensitivity rather than stronger reliance on static shortcuts. The project website can be found at https://ddz16.github.io/crpo.github.io/ .

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844