Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

arXiv:2605.23522v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become an effective way to improve prompt alignment and perceptual quality in diffusion and flow-matching generators. A critical step for applying online RL to flow matching is turning the deterministic sampling trajectory into a stochastic policy, typically by replacing the reverse-time Ordinary Differential Equation (ODE) with […]

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

arXiv:2605.21851v2 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards has become the standard recipe for improving LLM reasoning, but the dominant algorithm GRPO assigns a single trajectory-level advantage to every token, diluting the signal at pivotal reasoning steps and injecting noise at uninformative ones. Critic-free alternatives derived from on-policy distillation supply per-token signals through […]

BarrierSteer: LLM Safety via Learning Barrier Steering

arXiv:2602.20102v2 Announce Type: replace-cross Abstract: Despite the strong performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a significant obstacle to deployment, particularly in high-stakes settings. Addressing this challenge requires safety mechanisms that are both practically effective and theoretically grounded. In this paper, we introduce […]

Ceci n’est pas une explication: Evaluating Explanation Failures as Explainability Pitfalls in Language Learning Systems

arXiv:2604.26145v2 Announce Type: replace-cross Abstract: AI-powered language learning tools increasingly provide instant, personalised feedback to millions of learners worldwide. However, this feedback can fail in ways that are difficult for learners–and even teachers–to detect, potentially reinforcing misconceptions and eroding learning outcomes over extended use. We present a portion of L2-Bench, a benchmark for evaluating AI […]

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

arXiv:2605.22883v1 Announce Type: new Abstract: Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn workloads this unit remains coherent. For agentic systems – where a single user goal may trigger multi-step orchestration, tool calls, retries, and failure-recovery cycles – the invocation count is an […]

A mathematical theory of balancing relational generalization and memorization

arXiv:2605.22972v1 Announce Type: cross Abstract: Humans, animals, and modern machine learning models exhibit impressive abilities to learn complex behaviors and generalize these behaviors to unseen situations. This ability requires us to learn rules and regularities that allow for such generalizations. At the same time, in most complex environments, any rule will have its exceptions. How […]

Computable Fairness: Boltzmann-Softmax Control for AI Resource Allocation

arXiv:2605.22827v1 Announce Type: cross Abstract: In large-scale AI systems, allocating scarce resources such as GPU compute time and bandwidth among multiple agents is a critical challenge. Conventional policies focus on efficiency metrics, potentially leading to dominance concentration that undermines system diversity and stability. We propose Computable Fair Division (CFD), a framework that reinterprets the Boltzmann-Softmax […]

Agentic Proving for Program Verification

arXiv:2605.23772v1 Announce Type: new Abstract: Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation. Our results show that Claude generates […]

EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation

arXiv:2605.23493v1 Announce Type: new Abstract: On-Policy Distillation (OPD) has gained wide attraction as an LLM post-training paradigm due to its effectiveness in improving capabilities without introducing model distribution drift, and consequently, regression in general tasks. On-Policy Self-Distillation (OPSD) is an efficient use-case of OPD, which is appealing as it requires only a single model as […]

Solving the Aircraft Disassembly Scheduling Problem

arXiv:2605.23592v1 Announce Type: new Abstract: Dismantling aircrafts reaching their end of life is a complex endeavour that is necessary in terms of sustainability but yields small income margins for air transport companies. An efficient scheduling of the disassembly procedure is thus crucial to ensure the profitability of the process and incentivize practice. This is a […]

Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

arXiv:2605.22859v1 Announce Type: cross Abstract: Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent research. While machine learning models achieve near-human level agreement with human-scored reference sleep stages, their decisions are typically opaque and not designed to follow clinical scoring rules. We propose a transparent alternative: […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844