AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

If the past 12 months have taught us anything, it’s that the AI hype train is showing no signs of slowing. It’s hard to believe

Uncovering Competency Gaps in Large Language Models and Their Benchmarks

arXiv:2512.20638v1 Announce Type: cross Abstract: The evaluation of large language models (LLMs) relies heavily on standardized benchmarks. These benchmarks provide useful aggregated metrics for a

AI-Driven Green Cognitive Radio Networks for Sustainable 6G Communication

arXiv:2512.20739v1 Announce Type: cross Abstract: The 6G wireless aims at the Tb/s peak data rates are expected, a sub-millisecond latency, massive Internet of Things/vehicle connectivity,

Stabilizing Multimodal Autoencoders: A Theoretical and Empirical Analysis of Fusion Strategies

arXiv:2512.20749v1 Announce Type: cross Abstract: In recent years, the development of multimodal autoencoders has gained significant attention due to their potential to handle multimodal complex

Towards Optimal Performance and Action Consistency Guarantees in Dec-POMDPs with Inconsistent Beliefs and Limited Communication

arXiv:2512.20778v1 Announce Type: cross Abstract: Multi-agent decision-making under uncertainty is fundamental for effective and safe autonomous operation. In many real-world scenarios, each agent maintains its

Generalization of RLVR Using Causal Reasoning as a Testbed

December 25, 2025

arXiv:2512.20760v1 Announce Type: cross
Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising paradigm for post-training large language models (LLMs) on complex reasoning tasks. Yet, the conditions under which RLVR yields robust generalization remain poorly understood. This paper provides an empirical study of RLVR generalization in the setting of probabilistic inference over causal graphical models. This setting offers two natural axes along which to examine generalization: (i) the level of the probabilistic query — associational, interventional, or counterfactual — and (ii) the structural complexity of the query, measured by the size of its relevant subgraph. We construct datasets of causal graphs and queries spanning these difficulty axes and fine-tune Qwen-2.5-Instruct models using RLVR or supervised fine-tuning (SFT). We vary both the model scale (3B-32B) and the query level included in training. We find that RLVR yields stronger within-level and across-level generalization than SFT, but only for specific combinations of model size and training query level. Further analysis shows that RLVR’s effectiveness depends on the model’s initial reasoning competence. With sufficient initial competence, RLVR improves an LLM’s marginalization strategy and reduces errors in intermediate probability calculations, producing substantial accuracy gains, particularly on more complex queries. These findings show that RLVR can improve specific causal reasoning subskills, with its benefits emerging only when the model has sufficient initial competence.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844