Infectious disease burden and surveillance challenges in Jordan and Palestine: a systematic review and meta-analysis

BackgroundJordan and Palestine face public health challenges due to infectious diseases, with the added detrimental factors of long-term conflict, forced relocation, and lack of resources.

From pilot to policy: why AI health interventions fail to scale in developing countries

Post Content

One Token Is Enough: Improving Diffusion Language Models with a Sink Token

arXiv:2601.19657v2 Announce Type: replace-cross Abstract: Diffusion Language Models (DLMs) have emerged as a compelling alternative to autoregressive approaches, enabling parallel text generation with competitive performance.

Trustworthy Intelligent Education: A Systematic Perspective on Progress, Challenges, and Future Directions

arXiv:2601.21837v1 Announce Type: cross Abstract: In recent years, trustworthiness has garnered increasing attention and exploration in the field of intelligent education, due to the inherent

RobustExplain: Evaluating Robustness of LLM-Based Explanation Agents for Recommendation

arXiv:2601.19120v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly used to generate natural-language explanations in recommender systems, acting as explanation agents that reason

Demystifying the Slash Pattern in Attention: The Role of RoPE

January 29, 2026

arXiv:2601.08297v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) often exhibit slash attention patterns, where attention scores concentrate along the $Delta$-th sub-diagonal for some offset $Delta$. These patterns play a key role in passing information across tokens. But why do they emerge? In this paper, we demystify the emergence of these Slash-Dominant Heads (SDHs) from both empirical and theoretical perspectives. First, by analyzing open-source LLMs, we find that SDHs are intrinsic to models and generalize to out-of-distribution prompts. To explain the intrinsic emergence, we analyze the queries, keys, and Rotary Position Embedding (RoPE), which jointly determine attention scores. Our empirical analysis reveals two characteristic conditions of SDHs: (1) Queries and keys are almost rank-one, and (2) RoPE is dominated by medium- and high-frequency components. Under these conditions, queries and keys are nearly identical across tokens, and interactions between medium- and high-frequency components of RoPE give rise to SDHs. Beyond empirical evidence, we theoretically show that these conditions are sufficient to ensure the emergence of SDHs by formalizing them as our modeling assumptions. Particularly, we analyze the training dynamics of a shallow Transformer equipped with RoPE under these conditions, and prove that models trained via gradient descent exhibit SDHs. The SDHs generalize to out-of-distribution prompts.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844