May 18, 2026 – Page 12 – dijee Pharma Intelligence

ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization

arXiv:2605.12667v2 Announce Type: replace-cross Abstract: The alignment of Large Language Models (LLMs) utilizes Reinforcement Learning from AI Feedback (RLAIF) for non-verifiable domains such as long-form question answering and open-ended instruction following. These domains often rely on LLM based auto-raters to provide granular, multi-tier discrete rewards (e.g., 1-10 rubrics) that are inherently stochastic due to prompt […]

May 18, 2026

Attention Dispersion in Dynamic Graph Transformers: Diagnosis and a Transferable Fix

arXiv:2605.16112v1 Announce Type: cross Abstract: Transformer-based architectures have become the dominant paradigm for Continuous-Time Dynamic Graph (CTDG) learning, yet their performance remains limited on temporally shifted datasets. In this work, we identify attention dispersion as a shared failure mode of dynamic graph Transformers under temporal distribution shift. Through controlled ablation contrasting structurally and temporally distinguished […]

May 18, 2026

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

arXiv:2605.15224v1 Announce Type: new Abstract: Large language model-based agents make mistakes, yet critique can often guide the same model toward correct behavior. However, when critique is removed, the model may fail again on the same query, indicating that it has not internalized the critique’s guidance into its underlying capability. Meanwhile, a frozen critic cannot improve […]

May 18, 2026

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

arXiv:2605.15412v1 Announce Type: cross Abstract: Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into tradable signals. Recent LLM-based methods have shown promise in automating factor generation, but most of them still rely on prompt-level generation–evaluation–feedback […]

May 18, 2026

Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation

arXiv:2605.16241v1 Announce Type: cross Abstract: Billion-parameter Vision-Language-Action (VLA) policies have recently shown impressive performance in robotic manipulation, yet their size and inference cost remain major obstacles for real-time closed-loop control. We introduce textbfVLA-AD, a distillation framework that uses a Vision-Language Model as an offline semantic supervisor to transfer large VLA teachers into lightweight student policies. […]

May 18, 2026

STAR: A Stage-attributed Triage and Repair framework for RCA Agents in Microservices

arXiv:2605.15581v1 Announce Type: new Abstract: LLM-based root cause analysis (RCA) agents have recently emerged as a promising paradigm for incident diagnosis in microservice AIOps. However, their reliability remains fragile: an error in early evidence collection, hypothesis formulation, or causal analysis can propagate through the reasoning trace and eventually corrupt the final diagnosis. In this paper, […]

May 18, 2026

FormulaCode: Evaluating Agentic Optimization on Large Codebases

arXiv:2603.16011v2 Announce Type: replace-cross Abstract: Large language model (LLM) coding agents increasingly operate at the repository level, motivating benchmarks that evaluate their ability to optimize entire codebases under realistic constraints. Existing code benchmarks largely rely on synthetic tasks, binary correctness signals, or single-objective evaluation, limiting their ability to assess holistic optimization behavior. We introduce FormulaCode, […]

May 18, 2026

$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data

arXiv:2605.15417v1 Announce Type: cross Abstract: In GFlowNets and variational inference, it has been shown that the mean square error between target and model log probabilities is an effective, low variance, surrogate loss for training generative models. This loss has the property that when evaluated emphon-policy its gradients correspond to those of the KL divergence, while […]

May 18, 2026

Centralized vs Decentralized Federated Learning: A trade-off performance analysis

arXiv:2605.16089v1 Announce Type: cross Abstract: Federated Learning (FL) has emerged as a promising paradigm for collaborative model training across distributed edge devices while preserving data privacy especially with the huge increase amount of data due to the adoption of technologies which contributes to the growing number of IoT devices. Storing this amount of data centrally […]

May 18, 2026

Runtime-Structured Task Decomposition for Agentic Coding Systems

arXiv:2605.15425v1 Announce Type: cross Abstract: Agentic coding systems increasingly use large language models (LLMs) for software engineering tasks such as debugging, root cause analysis, and code review. However, many existing systems encode task logic, execution flow, and output generation inside monolithic prompts. This design creates brittle behavior, limited debuggability, and high retry costs because failures […]

May 18, 2026

See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation

arXiv:2605.15585v1 Announce Type: new Abstract: Large language models can generate executable code for educational animations, but the resulting renders often exhibit visual defects, including element overlap, misalignment, and broken animation continuity. These defects cannot be reliably detected from the code alone and become apparent only after execution. We formalize this problem as render-feedback-aware constrained code […]

May 18, 2026

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

arXiv:2605.06223v3 Announce Type: replace Abstract: Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user’s burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall […]

May 18, 2026

Subscribe for Updates