May 15, 2026 – Page 6 – dijee Pharma Intelligence

Self-Distilled Agentic Reinforcement Learning

arXiv:2605.15155v1 Announce Type: cross Abstract: Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon interaction. On-Policy Self-Distillation (OPSD) complements RL by introducing dense token-level guidance from a teacher branch augmented with privileged context. However, transferring OPSD to multi-turn agents proves […]

May 15, 2026

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

arXiv:2605.13941v1 Announce Type: cross Abstract: Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge […]

May 15, 2026

Enhanced and Efficient Reasoning in Large Learning Models

arXiv:2605.14036v1 Announce Type: new Abstract: In current Large Language Models we can trust the production of smoothly flowing prose on the basis of the principles of machine learning. However, there is no comparably principled basis to justify trust in the content of the text produced. It appears to be conventional wisdom that addressing this issue […]

May 15, 2026

Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines

arXiv:2605.13981v1 Announce Type: cross Abstract: The rise in deployment of large language models has driven a surge in GPU demand and datacenter scaling, raising concerns about electricity use, grid stress, and the impacts of modern AI workloads. Distillation is often promoted as one of the most effective paths to obtain cheaper, more efficient models, yet […]

May 15, 2026

AgenticEval: Toward Agentic and Self-Evolving Safety Evaluation of Large Language Models

arXiv:2509.26100v2 Announce Type: replace Abstract: The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliable safety and compliance evaluation. However, existing static benchmarks are ill-equipped to address the dynamic nature of AI risks and evolving regulations, creating a critical safety gap. This paper introduces a new paradigm of agentic safety evaluation, reframing […]

May 15, 2026

Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact

arXiv:2605.14021v1 Announce Type: cross Abstract: Google AI Overviews (AIOs) are arguably the most widely encountered deployment of generative AI, reaching over 2 billion users who may not realize the answers they see are AI-generated. Where search engines have traditionally surfaced ranked sources and left users to evaluate them, AIOs synthesize and deliver a single answer […]

May 15, 2026

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

arXiv:2605.14038v1 Announce Type: new Abstract: Large language models (LLMs) increasingly act as autonomous agents that must decide when to answer directly vs. when to invoke external tools. Prior work studying adaptive tool use has largely treated tool necessity as a model-agnostic property, annotated by human or LLM judge, and mostly cover cases where the answer […]

May 15, 2026

A Benchmark for Early-stage Parkinson’s Disease Detection from Speech

arXiv:2605.14066v1 Announce Type: cross Abstract: Early-stage Parkinson’s disease (EarlyPD) detection from speech is clinically meaningful yet underexplored, and published results are hard to compare because studies differ in datasets, languages, tasks, evaluation protocols, and EarlyPD definitions. To address this issue, we propose the first benchmark for speech-based EarlyPD detection, with a speaker-independent split designed for […]

May 15, 2026

Graph of States: Solving Abductive Tasks with Large Language Models

arXiv:2603.21250v2 Announce Type: replace Abstract: Logical reasoning encompasses deduction, induction, and abduction. However, while Large Language Models (LLMs) have effectively mastered the former two, abductive reasoning remains significantly underexplored. Existing frameworks, predominantly designed for static deductive tasks, fail to generalize to abductive reasoning due to unstructured state representation and lack of explicit state control. Consequently, […]

May 15, 2026

Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning

arXiv:2605.14048v1 Announce Type: new Abstract: Masked autoencoders (MAEs) have recently shown promise for self-supervised representation learning of resting-state brain functional connectivity (FC). However, a fundamental question remains unresolved: how should FC matrices be tokenized to align with the intrinsic modular organization of large-scale brain networks? Existing approaches typically adopt region-centric or graph-based schemes that treat […]

May 15, 2026

ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents

arXiv:2605.14153v1 Announce Type: cross Abstract: Exploitation is not a binary event. It is a ladder of acquiring progressive capabilities, from executing a single buggy line of code to taking full control of the target. However, existing LLM security benchmarks treat a crash as exploitation success. That single binary outcome collapses the hard parts of exploitation: […]

May 15, 2026

CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG

arXiv:2605.11611v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for training agentic retrieval-augmented generation (RAG) systems from outcome-only supervision. Most existing methods optimize policies from uniformly sampled rollouts, implicitly treating all trajectories as equally informative. However, trajectories differ substantially in search depth and are therefore not equally […]

May 15, 2026

Subscribe for Updates