TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization

arXiv:2604.07553v1 Announce Type: cross Abstract: This study presents a framework for generating the gold-standard summary fully automatically and reproducibly based on multiple human summaries of

Google, AI Literacy, and the Learning Sciences: Multiple Modes of Research, Industry, and Practice Partnerships

arXiv:2604.07601v1 Announce Type: cross Abstract: Enabling AI literacy in the general population at scale is a complex challenge requiring multiple stakeholders and institutions collaborating together.

GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control

arXiv:2604.07426v1 Announce Type: cross Abstract: Model-based reinforcement learning (MBRL) improves sample efficiency by optimizing policies inside imagined rollouts, but long-horizon planning degrades when model errors

Cluster Attention for Graph Machine Learning

arXiv:2604.07492v1 Announce Type: cross Abstract: Message Passing Neural Networks have recently become the most popular approach to graph machine learning tasks; however, their receptive field

TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

arXiv:2604.07894v1 Announce Type: cross Abstract: Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual’s needs and preferences.

Optimal Decay Spectra for Linear Recurrences

April 10, 2026

arXiv:2604.07658v1 Announce Type: cross
Abstract: Linear recurrent models offer linear-time sequence processing but often suffer from suboptimal long-range memory. We trace this to the decay spectrum: for $N$ channels, random initialization collapses the minimum spectral gap to $O(N^-2)$, yielding sub-exponential error $exp(-Omega(N/log N))$; linear spacing avoids collapse but degrades to $exp(-O(N/sqrtT))$, practically algebraic over long contexts. We introduce Position-Adaptive Spectral Tapering (PoST), an architecture-agnostic framework combining two mechanisms: (1) Spectral Reparameterization, which structurally enforces geometrically spaced log-decay rates, proven minimax optimal at rate $O(exp(-cN/log T))$; and (2) Position-Adaptive Scaling, the provably unique mechanism that eliminates the scale mismatch of static spectra (where only $Nlog t/log T$ of $N$ channels are effective at position $t$) by stretching the spectrum to the actual dependency range, sharpening the rate to $O(exp(-cN/log t))$. This scaling natively induces fractional invariance: the impulse response becomes scale-free, with channels interpolating between relative and absolute temporal coordinates. PoST integrates into any diagonal linear recurrence without overhead. We instantiate it across Mamba-2, RWKV-7, Gated DeltaNet, Gated Linear Attention, and RetNet. Pre-training at 180M-440M scales shows consistent zero-shot language modeling improvements, significant long-context retrieval gains for Mamba-2 (MQAR and NIAH), and competitive or improved performance across other architectures. Code: https://github.com/SiLifen/PoST.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844