May 28, 2026 – Page 19 – dijee Pharma Intelligence

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

arXiv:2605.28751v1 Announce Type: cross Abstract: Linear interpolation between fine-tuned checkpoints has been shown to trace the Pareto front between competing objectives, but whether extrapolative weight averaging can extend such frontiers to new checkpoints useful at inference time, without additional RL training, remains unclear. We study this question in RL for competitive programming, where hidden unit […]

May 28, 2026

A Policy-Driven Runtime Layer for Agentic LLM Serving

arXiv:2605.27744v1 Announce Type: new Abstract: Multi-agent LLM systems have become the dominant production workload, but the serving stack was not built for them. The agent framework above knows agent identities, role, schemas, and dispatch structure but never sees an engine-level event; the serving engine below sees every event but knows nothing about agents. A surprising […]

May 28, 2026

Domain size asymptotics for Markov logic networks

arXiv:2509.04192v2 Announce Type: replace Abstract: A Markov logic network (MLN) $mathbbM$ determines a probability distribution $mathbbP_n^mathbbM$ on the set $mathbfW_n$ of structures, or “possible worlds”, with domain $1, ldots, n$. We study the properties of such distributions as $n$ tends to infinity. We show that with mild assumptions on an MLN $mathbbM$ with one soft […]

May 28, 2026

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

arXiv:2605.27752v1 Announce Type: new Abstract: LLM confidence calibration is often evaluated by comparing two signals: token-probability scores and verbalized confidence. These signals are sometimes treated as direct readouts of model uncertainty, but their comparison depends on measurement choices that are rarely made explicit. In the main analysis, we hold the verbalized-confidence elicitation fixed: a single […]

May 28, 2026

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

arXiv:2603.00309v2 Announce Type: replace Abstract: The increasingly popular agentic AI paradigm promises to harness the power of multiple, general-purpose large language model (LLM) agents to collaboratively complete complex tasks. While many agentic AI systems reduce complexity through predefined workflows or fixed agent roles, the ideal is to support truly autonomous agents capable of emergent collaboration […]

May 28, 2026

Let Relations Speak: An End-to-End LLM-GNN Soft Prompt Framework for Fraud Detection

arXiv:2605.28524v1 Announce Type: new Abstract: In recent years, Large Language Models (LLMs) have shown great capability in processing graph tasks such as fraud detection. However, most existing methods rely heavily on rich text attributes, which poses difficulties for this domain due to the lack of textual data. Although some pioneering methods attempt to overcome it, […]

May 28, 2026

Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning

arXiv:2603.09882v2 Announce Type: replace-cross Abstract: Extrinsic dexterity leverages environmental contact to overcome the limitations of prehensile manipulation. However, achieving such dexterity in cluttered scenes remains challenging and underexplored, as it requires selectively exploiting contact among multiple interacting objects with inherently coupled dynamics. Existing approaches lack explicit modeling of such complex dynamics and therefore fall short […]

May 28, 2026

Entropy-aware Masking for Masked Language Modeling

arXiv:2605.28526v1 Announce Type: new Abstract: Masked language modeling has become a standard pretraining objective for training encoder-based language models. In this approach, certain tokens in the input are masked, and the model learns to predict them using the surrounding context. This process enables the model to capture both syntactic and semantic properties of language. Conventionally, […]

May 28, 2026

ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations

arXiv:2605.27908v1 Announce Type: cross Abstract: Existing emotional support conversation (ESC) systems mainly rely on end-to-end response generation or coarse strategy supervision, offering limited interpretability and little support for systematic skill improvement. We propose ESC-Skills, a skill-centric framework that discovers and self-evolves executable emotional support skills. We first model localized support interactions as Intervention Units (IUs), […]

May 28, 2026

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

arXiv:2605.27901v1 Announce Type: cross Abstract: Chain-of-thought (CoT) monitoring has been proposed as a promising safety mechanism for detecting misaligned behavior in large language models. However, its reliability remains largely unexplored beyond English and across diverse model families. We present the first large-scale evaluation of CoT monitorability across 13 diverse languages and seven frontier model families, […]

May 28, 2026

Structured Belief State and the First Precision-Aware Benchmark for LLM Memory Retrieval

arXiv:2605.11325v2 Announce Type: replace-cross Abstract: Every major benchmark for LLM memory systems, LoCoMo foremost, measures whether a model answered correctly, not whether the memory system retrieved correctly. A system returning its entire belief store achieves recall of 1.0 and passes answer-quality evaluation. This is the difference between a unit test and an integration test: retrieval […]

May 28, 2026

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

arXiv:2605.27567v1 Announce Type: new Abstract: Causal discovery is a cornerstone of scientific reasoning, yet whether large language models can perform it reliably remains an open question. Recent benchmarks show that even fine-tuned models plateau on simple causal graphs and degrade as complexity grows, but why they fail has not been established. We prove the failure […]

May 28, 2026

Subscribe for Updates