May 8, 2026 – Page 31 – dijee Pharma Intelligence

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

arXiv:2605.06611v1 Announce Type: cross Abstract: Despite the prevalence of the attention sink phenomenon in Large Language Models (LLMs), where initial tokens disproportionately monopolize attention scores, its structural origins remain elusive. This work provides a textitmechanistic explanation for this phenomenon. First, we trace its root to the value aggregation process inherent in self-attention, which induces a […]

May 8, 2026

Refining Gelfond Rationality Principle: Towards More Comprehensive Foundational Principles for Answer Set Semantics

arXiv:2507.01833v2 Announce Type: replace Abstract: Non-monotonic logic programming is the basis for a declarative problem solving paradigm known as answer set programming (ASP). Departing from the seminal definition by Gelfond and Lifschitz in 1988 for simple normal logic programs, various answer set semantics have been proposed for extensions. We consider two important questions: (1) Should […]

May 8, 2026

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

arXiv:2512.18857v3 Announce Type: replace Abstract: Large language models (LLMs) often solve challenging math exercises yet fail to apply the concept right when the problem requires genuine understanding. Popular Reinforcement Learning with Verifiable Rewards (RLVR) pipelines reinforce final answers but provide little fine-grained conceptual signal, so models improve at pattern reuse rather than conceptual applications. We […]

May 8, 2026

Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren’t Worth Training

arXiv:2605.02241v3 Announce Type: replace Abstract: How reliably can a small language model estimate its own correctness? The answer determines whether local-to-cloud routing-escalating queries a cheap local model cannot handle-can work without supervised training data. As inference costs dominate large language model (LLM) deployment budgets, routing most queries to a cheap local model while reserving expensive […]

May 8, 2026

PathRWKV: Enhancing Whole Slide Image Inference with Asymmetric Recurrent Modeling

arXiv:2503.03199v4 Announce Type: replace-cross Abstract: Whole Slide Imaging (WSI) has become a gold standard in cancer diagnosis, inspecting multi-scale information from cellular to tissue levels. Processing an entire WSI directly is infeasible due to GPU memory constraints; thus, Multiple Instance Learning (MIL) has emerged as the standard solution by partitioning WSIs into tiles. While recent […]

May 8, 2026

Frictional Q-Learning

arXiv:2509.19771v4 Announce Type: replace-cross Abstract: Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From this perspective, the replay buffer is represented as a smooth, low-dimensional action manifold, where […]

May 8, 2026

Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability

arXiv:2601.00655v3 Announce Type: replace-cross Abstract: This paper introduces Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains interpretable models by incorporating structured domain knowledge via a bi-objective formulation. IGBO encodes feature importance hierarchies as a Directed Acyclic Graph (DAG) via Central Limit Theorem-based construction and uses Temporal Integrated Gradients (TIG) to measure feature importance. The framework […]

May 8, 2026

The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization

arXiv:2602.00175v2 Announce Type: replace-cross Abstract: Text-to-image diffusion models (DMs) are frequently abused to produce harmful or copyrighted content, violating public interests. Concept erasure (unlearning) is a promising paradigm to alleviate this issue. However, there exists a peculiar forgetting illusion phenomenon with unclear cause. Based on empirical analysis, we formally explain this cause: most unlearning partially […]

May 8, 2026

Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation

arXiv:2509.04154v5 Announce Type: replace-cross Abstract: We introduce Robust Filter Attention (RFA), a formulation of self-attention as a robust state estimator. Each token is treated as a noisy observation of a latent trajectory governed by a linear stochastic differential equation (SDE), and attention weights are determined by consistency under this model rather than static feature similarity. […]

May 8, 2026

Segment-Aligned Policy Optimization for Multi-Modal Reasoning

arXiv:2605.01327v2 Announce Type: replace Abstract: Existing reinforcement learning approaches for Large Language Models typically perform policy optimization at the granularity of individual tokens or entire response sequences. However, such formulations often misalign with the natural step-wise structure of reasoning processes, leading to suboptimal credit assignment and unstable training in multi-modal reasoning tasks. To bridge this […]

May 8, 2026

From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs

arXiv:2603.15270v2 Announce Type: replace-cross Abstract: International Classification of Diseases (ICD) coding assigns diagnosis codes to clinical documents and is essential for healthcare billing and clinical analysis. Reliable coding requires that each predicted code be supported by explicit textual evidence. However, existing public datasets provide only code labels, without evidence annotations, limiting models’ ability to learn […]

May 8, 2026

H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models

arXiv:2605.00847v2 Announce Type: replace-cross Abstract: Representing and navigating hierarchy is a fundamental primitive of reasoning. Large language models have demonstrated proficiency in a wide variety of tasks requiring hierarchical reasoning, but there exists limited analysis on how the models geometrically represent the necessary latent constructions for such thinking. To this end, we develop H-probes, a […]

May 8, 2026

Subscribe for Updates