May 28, 2026 – Page 25 – dijee Pharma Intelligence

Optimal LTLf Synthesis

arXiv:2605.11544v2 Announce Type: replace Abstract: Strategy synthesis typically follows an all-or-nothing paradigm, returning unrealisable whenever a specification cannot be guaranteed in an uncertain environment. In this paper, we introduce optimal LTLf synthesis, where the goal is to realise as many objectives as possible from a given specification consisting of multiple objectives, especially for the case […]

May 28, 2026

Delay-Aware Reinforcement Learning for Highway On-Ramp Merging under Stochastic Communication Latency

arXiv:2403.11852v5 Announce Type: replace-cross Abstract: Delayed and partially observable state information poses significant challenges for reinforcement learning (RL)-based control in real-world autonomous driving. In highway on-ramp merging, a roadside unit (RSU) can sense nearby traffic, perform edge perception, and transmit state estimates to the ego vehicle over vehicle-to-infrastructure (V2I) links. With recent advancements in intelligent […]

May 28, 2026

Structured Agent Distillation for Large Language Model

arXiv:2505.13820v5 Announce Type: replace-cross Abstract: Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models […]

May 28, 2026

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

arXiv:2510.15859v4 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has driven recent breakthroughs in large language models (LLMs), especially for tasks where rewards can be computed automatically, such as code generation. However, it is less effective in open-ended medical dialogue, where feedback is ambiguous, context-dependent, and difficult to summarize into a single scalar signal-often requiring heavily […]

May 28, 2026

Differential syntactic and semantic encoding in LLMs

arXiv:2601.04765v4 Announce Type: replace-cross Abstract: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the very large DeepSeek-V3. We find that, by averaging hidden-representation vectors of sentences sharing syntactic structure or meaning, we obtain vectors that capture a significant proportion of the syntactic and […]

May 28, 2026

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

arXiv:2602.06025v3 Announce Type: replace-cross Abstract: Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and […]

May 28, 2026

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

arXiv:2603.09117v3 Announce Type: replace-cross Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) significantly enhances large language models (LLMs) reasoning but severely suffers from calibration degeneration, where models become excessively over-confident in incorrect answers. Previous studies devote to directly incorporating calibration objective into existing optimization target. However, our theoretical analysis demonstrates that there exists a fundamental gradient […]

May 28, 2026

Retention Consequence in Lifecycle Memory Control

arXiv:2604.16774v2 Announce Type: replace-cross Abstract: Persistent memory can fail after successful admission: a premise is written, then becomes a silent assumption, and later maintenance treats it as ordinary residue to be compressed, demoted, or evicted. We study this post-admission failure as a lifecycle-control problem. Existing memory systems already perform admission, update, compression, retrieval, and eviction. […]

May 28, 2026

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

arXiv:2605.15250v2 Announce Type: replace-cross Abstract: Multi-head Latent Attention (MLA), the attention used in DeepSeek-V2/V3, jointly compresses keys and values into a low-rank latent and matches the H100 roofline almost perfectly. Its trained weights, however, expose only one decoding path – an absorbed MQA form – which ties efficient inference to H100-class compute-bandwidth ratios, forfeits tensor […]

May 28, 2026

PromptEmbedder:: Efficient and Transferable Text Embedding via Dual-LLM Soft Prompting

arXiv:2605.28066v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated remarkable efficacy in text embedding, yet current adaptation methods like LoRA face significant bottlenecks in computational efficiency and cross-architecture transferability. Whenever a new backbone emerges, existing approaches require costly retraining from scratch. To address this, we propose PromptEmbedder, a novel dual-LLM framework that decouples […]

May 28, 2026

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

arXiv:2605.28187v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as scholar recommenders, shaping who is seen as an expert in academia. Existing audits remain English-centric, single discipline, and persona-agnostic, leaving the source of output variability poorly understood. To this end, we propose a benchmark that disentangles the effects of model choice and […]

May 28, 2026

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

arXiv:2605.28293v1 Announce Type: cross Abstract: Proactive Recommender Systems (PRSs) aim to guide user preference shift toward target items by generating paths of intermediate recommendations. Reinforcement learning (RL) provides a principled framework for optimizing such sequential decision tasks, as path rewards can naturally capture both short-term acceptance and long-term guidance effectiveness. However, naively applying policy gradients […]

May 28, 2026

Subscribe for Updates