Surrogate Neural Architecture Codesign Package (SNAC-Pack)

arXiv:2512.15998v1 Announce Type: cross Abstract: Neural Architecture Search is a powerful approach for automating model design, but existing methods struggle to accurately optimize for real

Evaluation of Generative Models for Emotional 3D Animation Generation in VR

arXiv:2512.16081v1 Announce Type: cross Abstract: Social interactions incorporate nonverbal signals to convey emotions alongside speech, including facial expressions and body gestures. Generative models have demonstrated

Cybercrime and Computer Forensics in Epoch of Artificial Intelligence in India

arXiv:2512.15799v1 Announce Type: cross Abstract: The integration of generative Artificial Intelligence into the digital ecosystem necessitates a critical re-evaluation of Indian criminal jurisprudence regarding computational

VET Your Agent: Towards Host-Independent Autonomy via Verifiable Execution Traces

arXiv:2512.15892v1 Announce Type: cross Abstract: Recent advances in large language models (LLMs) have enabled a new generation of autonomous agents that operate over sustained periods

ReactorFold: Generative discovery of nuclear reactor cores via emergent physical reasoning

arXiv:2512.15756v1 Announce Type: cross Abstract: Designing nuclear reactor cores requires navigating large discrete design spaces governed by complex neutronic interactions. Traditional deterministic, metaheuristic, and machine-learning-assisted

How Many Heads Make an SSM? A Unified Framework for Attention and State Space Models

December 18, 2025

arXiv:2512.15115v1 Announce Type: cross
Abstract: Sequence modeling has produced diverse architectures — from classical recurrent neural networks to modern Transformers and state space models (SSMs) — yet a unified theoretical understanding of expressivity and trainability trade-offs remains limited. We introduce a unified framework that represents a broad class of sequence maps via an input-dependent effective interaction operator $W_ij(X)$, making explicit two recurring construction patterns: (i) the Unified Factorized Framework (Explicit) (attention-style mixing), in which $W_ij(X)$ varies through scalar coefficients applied to shared value maps, and (ii) Structured Dynamics (Implicit) (state-space recurrences), in which $W_ij$ is induced by a latent dynamical system. Using this framework, we derive three theoretical results. First, we establish the Interaction Rank Gap: models in the Unified Factorized Framework, such as single-head attention, are constrained to a low-dimensional operator span and cannot represent certain structured dynamical maps. Second, we prove an Equivalence (Head-Count) Theorem showing that, within our multi-head factorized class, representing a linear SSM whose lag operators span a $k$-dimensional subspace on length-$n$ sequences requires and is achievable with $H=k$ heads. Third, we prove a Gradient Highway Result, showing that attention layers admit inputs with distance-independent gradient paths, whereas stable linear dynamics exhibit distance-dependent gradient attenuation. Together, these results formalize a fundamental trade-off between algebraic expressivity (interaction/operator span) and long-range gradient propagation, providing theoretical grounding for modern sequence architecture design.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844