May 14, 2026 – Page 11 – dijee Pharma Intelligence

TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

arXiv:2605.10983v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has shown extraordinary potential in aligning diffusion models to downstream tasks, yet most of them still suffer from significant reward hacking, which degrades generative diversity and quality by inducing visual mode collapse and amplifying unreliable rewards. We identify the root cause as the mode-seeking nature of these […]

May 14, 2026

Continual Learning with Multilingual Foundation Model

arXiv:2605.13415v1 Announce Type: cross Abstract: This paper presents a multi-stage framework for detecting reclaimed slurs in multilingual social media discourse. It addresses the challenge of identifying reclamatory versus non-reclamatory usage of LGBTQ+-related slurs across English, Spanish, and Italian tweets. The framework handles three intertwined methodological challenges like data scarcity, class imbalance, and cross-linguistic variation in […]

May 14, 2026

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

arXiv:2605.13435v1 Announce Type: cross Abstract: There is growing interest in utilizing flow-based models as decision-making policies in reinforcement learning due to their high expressive capacity. However, effectively leveraging this expressivity for value maximization remains challenging, as naive gradient-based optimization requires backpropagating through numerical solvers and often leads to instability. Existing approaches typically address this issue […]

May 14, 2026

Gradient-Free Noise Optimization for Reward Alignment in Generative Models

arXiv:2605.11347v2 Announce Type: replace-cross Abstract: Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is noise-space optimization, but existing approaches require backpropagation through the generator and reward pipeline, limiting applicability to differentiable settings. To address this, here we present […]

May 14, 2026

CUBic: Coordinated Unified Bimanual Perception and Control Framework

arXiv:2605.13452v1 Announce Type: cross Abstract: Recent advances in visuomotor policy learning have enabled robots to perform control directly from visual inputs. Yet, extending such end-to-end learning from single-arm to bimanual manipulation remains challenging due to the need for both independent perception and coordinated interaction between arms. Existing methods typically favor one side — either decoupling […]

May 14, 2026

Entropy Aware Reward Guidance for Diffusion Language Model Alignment

arXiv:2602.05000v2 Announce Type: replace-cross Abstract: Reward guidance, also known as posterior sampling, is a popular method for test-time adaptation and post-training in continuous diffusion models. In this paper, we study reward guidance for discrete diffusion language models; now, one cannot differentiate through the natural outputs of the model because they are discrete tokens. We introduce […]

May 14, 2026

Shields to Guarantee Probabilistic Safety in MDPs

arXiv:2605.10888v2 Announce Type: replace-cross Abstract: Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes with strong guarantees about safety and maximal permissiveness. However, shielding systems for probabilistic safety, where something bad is allowed to happen with an acceptable probability, has […]

May 14, 2026

Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation

arXiv:2605.01750v2 Announce Type: replace-cross Abstract: Grounding is the collaborative process of establishing mutual belief sufficient for a communicative goal. While static grounding maps language to a shared context, dynamic grounding requires agents to negotiate meaning across turns. Current multi-agent Large Language Model (LLM) benchmarks largely emphasize static, one-shot tasks, overlooking whether agents can repair grounding […]

May 14, 2026

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

arXiv:2605.13412v1 Announce Type: cross Abstract: Off-the-shelf large language models (LLMs) are increasingly used to automate text annotation, yet their effectiveness remains underexplored for underrepresented languages and specialized domains where the class definition requires subtle expert understanding. We investigate LLM-based annotation for a novel legal NLP task: identifying the presence and sentiment of credibility assessments in […]

May 14, 2026

NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating

arXiv:2605.13651v1 Announce Type: cross Abstract: Audio provides critical situational cues, yet current Audio Language Models (ALMs) face an attention bottleneck in long-form recordings where dominant background patterns can dilute rare, salient events. We introduce NAACA, a training-free NeuroAuditory Attentive Cognitive Architecture that reframes attention allocation as an auditory salience filtering problem. At its core is […]

May 14, 2026

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

arXiv:2605.10426v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for end-to-end autonomous driving. However, existing reasoning mechanisms still struggle to provide planning-oriented intermediate representations: textual Chain-of-Thought (CoT) fails to preserve continuous spatiotemporal structure, while latent world reasoning remains difficult to use as a direct condition for action generation. In this […]

May 14, 2026

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

arXiv:2408.12935v4 Announce Type: replace Abstract: AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems. With the rapid proliferation of AI and especially with the recent advancement of Generative AI (or GAI), the technology ecosystem behind the design, development, adoption, and deployment of AI systems has drastically […]

May 14, 2026

Subscribe for Updates