arXiv:2604.04655v1 Announce Type: cross
Abstract: Neural network grokking — the abrupt memorization-to-generalization transition — challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a textitdimensional phase transition: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects textbfgradient field geometry, not network architecture: synthetic i.i.d. Gaussian gradients maintain $D approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing — robust across topologies — offers new insight into the trainability of overparameterized networks.
What’s in a name? Moderna’s “vaccine” vs. “therapy” dilemma
Is it the Department of Defense or the Department of War? The Gulf of Mexico or the Gulf of America? A vaccine—or an “individualized neoantigen
