TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

arXiv:2605.10983v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has shown extraordinary potential in aligning diffusion models to downstream tasks, yet most of them still suffer from significant reward hacking, which degrades generative diversity and quality by inducing visual mode collapse and amplifying unreliable rewards. We identify the root cause as the mode-seeking nature of these […]

Continual Learning with Multilingual Foundation Model

arXiv:2605.13415v1 Announce Type: cross Abstract: This paper presents a multi-stage framework for detecting reclaimed slurs in multilingual social media discourse. It addresses the challenge of identifying reclamatory versus non-reclamatory usage of LGBTQ+-related slurs across English, Spanish, and Italian tweets. The framework handles three intertwined methodological challenges like data scarcity, class imbalance, and cross-linguistic variation in […]

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

arXiv:2605.13435v1 Announce Type: cross Abstract: There is growing interest in utilizing flow-based models as decision-making policies in reinforcement learning due to their high expressive capacity. However, effectively leveraging this expressivity for value maximization remains challenging, as naive gradient-based optimization requires backpropagating through numerical solvers and often leads to instability. Existing approaches typically address this issue […]

Gradient-Free Noise Optimization for Reward Alignment in Generative Models

arXiv:2605.11347v2 Announce Type: replace-cross Abstract: Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is noise-space optimization, but existing approaches require backpropagation through the generator and reward pipeline, limiting applicability to differentiable settings. To address this, here we present […]

CUBic: Coordinated Unified Bimanual Perception and Control Framework

arXiv:2605.13452v1 Announce Type: cross Abstract: Recent advances in visuomotor policy learning have enabled robots to perform control directly from visual inputs. Yet, extending such end-to-end learning from single-arm to bimanual manipulation remains challenging due to the need for both independent perception and coordinated interaction between arms. Existing methods typically favor one side — either decoupling […]

Entropy Aware Reward Guidance for Diffusion Language Model Alignment

arXiv:2602.05000v2 Announce Type: replace-cross Abstract: Reward guidance, also known as posterior sampling, is a popular method for test-time adaptation and post-training in continuous diffusion models. In this paper, we study reward guidance for discrete diffusion language models; now, one cannot differentiate through the natural outputs of the model because they are discrete tokens. We introduce […]

Shields to Guarantee Probabilistic Safety in MDPs

arXiv:2605.10888v2 Announce Type: replace-cross Abstract: Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes with strong guarantees about safety and maximal permissiveness. However, shielding systems for probabilistic safety, where something bad is allowed to happen with an acceptable probability, has […]

Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation

arXiv:2605.01750v2 Announce Type: replace-cross Abstract: Grounding is the collaborative process of establishing mutual belief sufficient for a communicative goal. While static grounding maps language to a shared context, dynamic grounding requires agents to negotiate meaning across turns. Current multi-agent Large Language Model (LLM) benchmarks largely emphasize static, one-shot tasks, overlooking whether agents can repair grounding […]

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

arXiv:2605.13412v1 Announce Type: cross Abstract: Off-the-shelf large language models (LLMs) are increasingly used to automate text annotation, yet their effectiveness remains underexplored for underrepresented languages and specialized domains where the class definition requires subtle expert understanding. We investigate LLM-based annotation for a novel legal NLP task: identifying the presence and sentiment of credibility assessments in […]

NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating

arXiv:2605.13651v1 Announce Type: cross Abstract: Audio provides critical situational cues, yet current Audio Language Models (ALMs) face an attention bottleneck in long-form recordings where dominant background patterns can dilute rare, salient events. We introduce NAACA, a training-free NeuroAuditory Attentive Cognitive Architecture that reframes attention allocation as an auditory salience filtering problem. At its core is […]

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

arXiv:2605.10426v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for end-to-end autonomous driving. However, existing reasoning mechanisms still struggle to provide planning-oriented intermediate representations: textual Chain-of-Thought (CoT) fails to preserve continuous spatiotemporal structure, while latent world reasoning remains difficult to use as a direct condition for action generation. In this […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844