arXiv:2603.28964v3 Announce Type: replace-cross
Abstract: We develop the spectral edge analysis: phase transitions in neural network training — grokking, capability gains, loss plateaus — are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In the extreme aspect ratio regime (parameters $P sim 10^8$, window $W sim 10$), the classical BBP detection threshold is vacuous; the operative structure is the intra-signal gap separating dominant from subdominant modes at position $k^* = mathrmargmax, sigma_j/sigma_j+1$.
From three assumptions we derive: (i) gap dynamics governed by a Dyson-type ODE with curvature asymmetry, damping, and gradient driving; (ii) a spectral loss decomposition linking each mode’s learning contribution to its Davis–Kahan stability coefficient; (iii) the Gap Maximality Principle, showing that $k^*$ is the unique dynamically privileged position — its collapse is the only one that disrupts learning, and it sustains itself through an $alpha$-feedback loop requiring no assumption on the optimizer. The adiabatic parameter $mathcalA = |Delta G|_F / (eta, g^2)$ controls circuit stability: $mathcalA ll 1$ (plateau), $mathcalA sim 1$ (phase transition), $mathcalA gg 1$ (forgetting).
Tested across six model families (150K–124M parameters): gap dynamics precede every grokking event (24/24 with weight decay, 1/24 without), the gap position is optimizer-dependent (Muon: $k^*=1$, AdamW: $k^*=2$ on the same model), and 19/20 quantitative predictions are confirmed. The framework is consistent with the edge of stability, Tensor Programs, Dyson Brownian motion, the Lottery Ticket Hypothesis, and neural scaling laws.
Digital health tools and point solutions—pitfalls in population health program measurement
Digital health tools are generally poorly regulated and often lack strong research evidence, posing challenges for purchasers of point solutions such as employer groups and