May 8, 2026 – Page 10 – dijee Pharma Intelligence

AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

arXiv:2605.06149v1 Announce Type: cross Abstract: The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor–critic implementations can become unstable and degenerate toward TD-error collapse. We propose […]

May 8, 2026

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

arXiv:2605.05812v1 Announce Type: new Abstract: Off-policy, value-based reinforcement learning methods such as Q-learning are appealing because they can learn from arbitrary experience, including data collected by older policies or other agents. In practice, however, bootstrapping makes long-horizon learning brittle: estimation errors at later states propagate backward through temporal-difference (TD) updates and can compound over time. […]

May 8, 2026

An Empirical Study of Proactive Coding Assistants in Real-World Software Development

arXiv:2605.05700v1 Announce Type: cross Abstract: Large language model (LLM)-based coding assistants have made substantial progress, yet most systems remain reactive, requiring developers to explicitly formulate their needs. Proactive coding assistants aim to infer latent developer intent from integrated development environment (IDE) interactions and repository context, thereby reducing interaction overhead and supporting more seamless assistance. However, […]

May 8, 2026

LeakDojo: Decoding the Leakage Threats of RAG Systems

arXiv:2605.05818v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to leverage external knowledge, but also exposes valuable RAG databases to leakage attacks. As RAG systems grow more complex and LLMs exhibit stronger instruction-following capabilities, existing studies fall short of systematically assessing RAG leakage risks. We present LeakDojo, a configurable framework for […]

May 8, 2026

Two Steps Are All You Need: Efficient 3D Point Cloud Anomaly Detection with Consistency Models

arXiv:2605.05372v1 Announce Type: cross Abstract: Diffusion models are rapidly redefining 3D anomaly detection in point cloud data. As 3D sensing becomes integral to modern manufacturing, reliable anomaly detection is essential for high-throughput quality assurance and process control. Yet practical deployment on resource-constrained, latency-critical systems remains limited. Existing methods are often computationally prohibitive or unreliable in […]

May 8, 2026

Accelerating LMO-Based Optimization via Implicit Gradient Transport

arXiv:2605.05577v1 Announce Type: cross Abstract: Recent optimizers such as Lion and Muon have demonstrated strong empirical performance by normalizing gradient momentum via linear minimization oracles (LMOs). While variance reduction has been explored to accelerate LMO-based methods, it typically incurs substantial computational overhead due to additional gradient evaluations. At the same time, the theoretical understanding of […]

May 8, 2026

Adaptive Computation Depth via Learned Token Routing in Transformers

arXiv:2605.05222v1 Announce Type: cross Abstract: Standard transformer architectures apply the same number of layers to every token regardless of contextual difficulty. We present Token-Selective Attention (TSA), a learned per-token gate on residual updates between consecutive transformer blocks. Each gate is a lightweight two-layer multi-layer perceptron (MLP) that produces a continuous halting probability, making the mechanism […]

May 8, 2026

Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning

arXiv:2605.05262v1 Announce Type: cross Abstract: We formalize Rollout Informativeness under a Fixed Budget (RIFB) as the expected non-vanishing policy-gradient mass that a tool-use rollout set injects into Group Relative Policy Optimization (GRPO). We prove that any budget-agnostic independent sampler suffers a collapse rate bounded away from zero for hard prompts regardless of the budget. Motivated […]

May 8, 2026

Mathematical Modeling of Early Embryonic Cell Cycles of Drosophila melanogaster

arXiv:2605.06598v1 Announce Type: new Abstract: In the early stages of development, Drosophila melanogaster embryos possess very fast and well-coordinated cell cycles. In the cell cycle, CDK activity is essentially regulated by binding CDK and CycB to form an active complex and by phosphorylating CDK via CDC25 and dephosphorylating it via Wee1. We develop a mathematical […]

May 8, 2026

AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

May 8, 2026

Self-Consistency Is Losing Its Edge: Diminishing Returns and Rising Costs in Modern LLMs

arXiv:2511.00751v2 Announce Type: replace Abstract: Self-consistency — sampling multiple reasoning paths and selecting the most frequent answer — was designed for an era when language models made frequent, unpredictable errors. This study argues that the technique has become increasingly wasteful as models grow stronger, and may degrade performance on problems that modern models already solve […]

May 8, 2026

Towards Metric-Faithful Neural Graph Matching

arXiv:2605.06588v1 Announce Type: cross Abstract: Graph Edit Distance (GED) is a fundamental, albeit NP-hard, metric for structural graph similarity. Recent neural graph matching architectures approximate GED by first encoding graphs with a Graph Neural Network (GNN) and then applying either a graph-level regression head or a matching-based alignment module. Despite substantial architectural progress, the role […]

May 8, 2026

Subscribe for Updates