May 8, 2026 – Page 6 – dijee Pharma Intelligence

Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs

arXiv:2605.06111v1 Announce Type: cross Abstract: Reinforcement learning (RL) with verifiable rewards has proven effective at post-training LLMs for coding, yet deploying separate task-specific specialists incurs costs that scale with the number of tasks, motivating a unified multi-task RL (MTRL) approach. However, existing MTRL methods treat all coding tasks uniformly, relying on fixed data curricula under […]

May 8, 2026

TACT: Mitigating Overthinking and Overacting in Coding Agents via Activation Steering

arXiv:2605.05980v1 Announce Type: new Abstract: When language model agents tackle complex software engineering tasks, they often degrade over long trajectories, which we define as *agent drift*. We focus on two recurring failure modes *overthinking* and *overacting*, i.e., where the agent repeatedly reasons over information it already has, and where it issues tool calls without integrating […]

May 8, 2026

Are we Doomed to an AI Race? Why Self-Interest Could Drive Countries Towards a Moratorium on Superintelligence

arXiv:2605.01297v2 Announce Type: replace-cross Abstract: This paper uses game theory to argue that, contrary to the prevailing view, a moratorium on Artificial Superintelligence (ASI) can be in a state’s self-interest. By formalizing trategic interactions between geopolitical superpowers, we model the trade-off between the benefits of technological supremacy and the catastrophic risks of uncontrolled ASI. The […]

May 8, 2026

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

arXiv:2605.06105v1 Announce Type: new Abstract: Long-context inference in decoder-only language models is costly because long prompts are processed during Prefill, cached at every layer, and repeatedly attended to during autoregressive Decode. We introduce emphShallow Prefill, dEEp Decode (SPEED), a phase-asymmetric KV-visibility policy that materializes non-anchor prompt-token KV states only in lower layers while keeping Decode-phase […]

May 8, 2026

Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

arXiv:2605.06104v1 Announce Type: cross Abstract: Decision Transformer (DT) formulates offline reinforcement learning as autoregressive sequence modeling, achieving promising results by predicting actions from a sequence of Return-to-Go (RTG), state, and action tokens. However, RTG is a scalar that summarizes future rewards, containing far less information than typical state or action vectors, yet it consumes the […]

May 8, 2026

Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges

arXiv:2605.06161v1 Announce Type: new Abstract: LLM-as-a-Judge pipelines have become the de facto evaluator for agent safety, yet existing benchmarks treat their verdicts as ground-truth proxies without checking whether the verdicts depend on the agent’s behavior or merely on how the evaluation policy happens to be worded. We argue that any trustworthy safety judge must satisfy […]

May 8, 2026

H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models

arXiv:2605.00847v2 Announce Type: replace-cross Abstract: Representing and navigating hierarchy is a fundamental primitive of reasoning. Large language models have demonstrated proficiency in a wide variety of tasks requiring hierarchical reasoning, but there exists limited analysis on how the models geometrically represent the necessary latent constructions for such thinking. To this end, we develop H-probes, a […]

May 8, 2026

Towards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metric

arXiv:2605.06201v1 Announce Type: new Abstract: Dominant accuracy evaluation might reward unwarranted guessing of Large Language Models, and it might not be applicable to novel tasks for model validation without ground-truth (gt) annotation. Based on basic logic principle, we propose a novel framework to evaluate the vision-language logical consistency of MLLMs on both sufficient and necessary […]

May 8, 2026

CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision

arXiv:2605.06100v1 Announce Type: cross Abstract: Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods already learn measurement weighting through the solver, but they still use position-only objectives. As a result, the […]

May 8, 2026

Higher-order interactions in ecology can be hidden in plain sight

arXiv:2605.06301v1 Announce Type: new Abstract: Higher-order interactions are increasingly recognized as a key component of ecological dynamics. However, we show that higher-order Lotka-Volterra dynamics can, in some scenarios, be accurately reproduced by effective pairwise models fitted to the same abundance time series. Consequently, higher-order interactions cannot, in general, be inferred from time-series data alone. We […]

May 8, 2026

Caracal: Causal Architecture via Spectral Mixing

arXiv:2605.00292v2 Announce Type: replace-cross Abstract: The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, O(L log(L)) Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We […]

May 8, 2026

Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

arXiv:2603.28964v3 Announce Type: replace-cross Abstract: We develop the spectral edge analysis: phase transitions in neural network training — grokking, capability gains, loss plateaus — are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In the extreme aspect ratio regime (parameters $P sim 10^8$, window $W sim 10$), the classical BBP […]

May 8, 2026

Subscribe for Updates