May 25, 2026 – Page 17 – dijee Pharma Intelligence

Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

arXiv:2605.23522v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become an effective way to improve prompt alignment and perceptual quality in diffusion and flow-matching generators. A critical step for applying online RL to flow matching is turning the deterministic sampling trajectory into a stochastic policy, typically by replacing the reverse-time Ordinary Differential Equation (ODE) with […]

May 25, 2026

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

arXiv:2605.21851v2 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards has become the standard recipe for improving LLM reasoning, but the dominant algorithm GRPO assigns a single trajectory-level advantage to every token, diluting the signal at pivotal reasoning steps and injecting noise at uninformative ones. Critic-free alternatives derived from on-policy distillation supply per-token signals through […]

May 25, 2026

BarrierSteer: LLM Safety via Learning Barrier Steering

arXiv:2602.20102v2 Announce Type: replace-cross Abstract: Despite the strong performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a significant obstacle to deployment, particularly in high-stakes settings. Addressing this challenge requires safety mechanisms that are both practically effective and theoretically grounded. In this paper, we introduce […]

May 25, 2026

Ceci n’est pas une explication: Evaluating Explanation Failures as Explainability Pitfalls in Language Learning Systems

arXiv:2604.26145v2 Announce Type: replace-cross Abstract: AI-powered language learning tools increasingly provide instant, personalised feedback to millions of learners worldwide. However, this feedback can fail in ways that are difficult for learners–and even teachers–to detect, potentially reinforcing misconceptions and eroding learning outcomes over extended use. We present a portion of L2-Bench, a benchmark for evaluating AI […]

May 25, 2026

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

arXiv:2605.22883v1 Announce Type: new Abstract: Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn workloads this unit remains coherent. For agentic systems – where a single user goal may trigger multi-step orchestration, tool calls, retries, and failure-recovery cycles – the invocation count is an […]

May 25, 2026

A mathematical theory of balancing relational generalization and memorization

arXiv:2605.22972v1 Announce Type: cross Abstract: Humans, animals, and modern machine learning models exhibit impressive abilities to learn complex behaviors and generalize these behaviors to unseen situations. This ability requires us to learn rules and regularities that allow for such generalizations. At the same time, in most complex environments, any rule will have its exceptions. How […]

May 25, 2026

Computable Fairness: Boltzmann-Softmax Control for AI Resource Allocation

arXiv:2605.22827v1 Announce Type: cross Abstract: In large-scale AI systems, allocating scarce resources such as GPU compute time and bandwidth among multiple agents is a critical challenge. Conventional policies focus on efficiency metrics, potentially leading to dominance concentration that undermines system diversity and stability. We propose Computable Fair Division (CFD), a framework that reinterprets the Boltzmann-Softmax […]

May 25, 2026

Agentic Proving for Program Verification

arXiv:2605.23772v1 Announce Type: new Abstract: Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation. Our results show that Claude generates […]

May 25, 2026

EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation

arXiv:2605.23493v1 Announce Type: new Abstract: On-Policy Distillation (OPD) has gained wide attraction as an LLM post-training paradigm due to its effectiveness in improving capabilities without introducing model distribution drift, and consequently, regression in general tasks. On-Policy Self-Distillation (OPSD) is an efficient use-case of OPD, which is appealing as it requires only a single model as […]

May 25, 2026

Solving the Aircraft Disassembly Scheduling Problem

arXiv:2605.23592v1 Announce Type: new Abstract: Dismantling aircrafts reaching their end of life is a complex endeavour that is necessary in terms of sustainability but yields small income margins for air transport companies. An efficient scheduling of the disassembly procedure is thus crucial to ensure the profitability of the process and incentivize practice. This is a […]

May 25, 2026

Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

arXiv:2605.22859v1 Announce Type: cross Abstract: Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent research. While machine learning models achieve near-human level agreement with human-scored reference sleep stages, their decisions are typically opaque and not designed to follow clinical scoring rules. We propose a transparent alternative: […]

May 25, 2026

Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test

arXiv:2605.22841v1 Announce Type: cross Abstract: What happens when the strongest alliance member pressures a weaker member over territory and strategic control? We examine the Greenland sovereignty crisis as a stress test for LLM geopolitics, centered on the 2019-2026 U.S. push to acquire Greenland from the Kingdom of Denmark. The crisis nests two collective-action problems: Arctic […]

May 25, 2026

Subscribe for Updates