arXiv:2603.18567v1 Announce Type: cross Abstract: Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, […]
Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm
arXiv:2603.18007v1 Announce Type: cross Abstract: The study explores whether current Large Language Models (LLMs) exhibit Theory of Mind (ToM) capabilities — specifically, the ability to infer others’ beliefs, intentions, and emotions from text. Given that LLMs are trained on language data without social embodiment or access to other manifestations of mental representations, their apparent social-cognitive […]
Transformers Remember First, Forget Last: Dual-Process Interference in LLMs
arXiv:2603.00270v2 Announce Type: replace-cross Abstract: When large language models encounter conflicting information in context, which memories survive — early or recent? We adapt classical interference paradigms from cognitive psychology to answer this question, testing 39 LLMs across diverse architectures and scales. Every model shows the same pattern: proactive interference (PI) dominates retroactive interference (RI) universally […]
DynaRAG: Bridging Static and Dynamic Knowledge in Retrieval-Augmented Generation
arXiv:2603.18012v1 Announce Type: cross Abstract: We present DynaRAG, a retrieval-augmented generation (RAG) framework designed to handle both static and time-sensitive information needs through dynamic knowledge integration. Unlike traditional RAG pipelines that rely solely on static corpora, DynaRAG selectively invokes external APIs when retrieved documents are insufficient for answering a query. The system employs an LLM-based […]
Transformers Learn Robust In-Context Regression under Distributional Uncertainty
arXiv:2603.18564v1 Announce Type: cross Abstract: Recent work has shown that Transformers can perform in-context learning for linear regression under restrictive assumptions, including i.i.d. data, Gaussian noise, and Gaussian regression coefficients. However, real-world data often violate these assumptions: the distributions of inputs, noise, and coefficients are typically unknown, non-Gaussian, and may exhibit dependency across the prompt. […]
Using Laplace Transform To Optimize the Hallucination of Generation Models
arXiv:2603.18022v1 Announce Type: cross Abstract: To explore the feasibility of avoiding the confident error (or hallucination) of generation models (GMs), we formalise the system of GMs as a class of stochastic dynamical systems through the lens of control theory. Numerous factors can be attributed to the hallucination of the learning process of GMs, utilising knowledge […]
Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training
arXiv:2602.23696v3 Announce Type: replace-cross Abstract: We analyze cumulative parameter trajectories of transformer training under AdamW and identify a dominant low-dimensional drift direction (“backbone”) that captures 60–80% of long-horizon displacement from initialization. This direction is highly stable over rolling training windows yet reorients gradually across phases, particularly following objective reweighting. Per-batch gradients exhibit near-noise-floor alignment with […]
KD-EKF: Knowledge-Distilled Adaptive Covariance EKF for Robust UWB/PDR Indoor Localization
arXiv:2603.18027v1 Announce Type: cross Abstract: Ultra-wideband (UWB) indoor localization provides centimeter-level accuracy and low latency, but its measurement reliability degrades severely under Non-Line-of-Sight (NLOS) conditions, leading to meter-scale ranging errors and inconsistent uncertainty characteristics. Inertial Measurement Unit (IMU)-based Pedestrian Dead Reckoning (PDR) complements UWB by providing infrastructure-free motion estimation; however, its error accumulates nonlinearly over […]
HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering
arXiv:2603.18558v1 Announce Type: cross Abstract: Long-form video question answering requires reasoning over extended temporal contexts, making frame selection critical for large vision-language models (LVLMs) bound by finite context windows. Existing methods face a sharp trade-off: similarity-based selectors are fast but collapse compositional queries into a single dense vector, losing sub-event ordering and cross-modal bindings; agent-based […]
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
arXiv:2603.18031v1 Announce Type: cross Abstract: Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. We present a […]
Stable Deep Reinforcement Learning via Isotropic Gaussian Representations
arXiv:2602.19373v2 Announce Type: replace-cross Abstract: Deep reinforcement learning systems often suffer from unstable training dynamics due to non-stationarity, where learning objectives and data distributions evolve over time. We show that under non-stationary targets, isotropic Gaussian embeddings are provably advantageous. In particular, they induce stable tracking of time-varying targets for linear readouts, achieve maximal entropy under […]
NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference
arXiv:2603.18046v1 Announce Type: cross Abstract: When users query proprietary LLM APIs, they receive outputs with no cryptographic assurance that the claimed model was actually used. Service providers could substitute cheaper models, apply aggressive quantization, or return cached responses – all undetectable by users paying premium prices for frontier capabilities. We present METHOD, a zero-knowledge proof […]