arXiv:2603.28921v2 Announce Type: replace-cross
Abstract: Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its
optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 – 2*sqrt(alpha(t)), where alpha(t) is
the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10,
beta-scheduling delivers 1.9x faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution
under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of whether the model was
trained with SGD or Adam (100% overlap). Surgical correction of only these layers fixes 62 misclassifications while retraining only 18% of parameters.
A hybrid schedule — physics momentum for fast early convergence, then constant momentum for the final refinement — reaches 95% accuracy fastest
among five methods tested. The main contribution is not an accuracy improvement but a principled, parameter-free tool for localizing and correcting
specific failure modes in trained networks.
Coming soon: 10 Things That Matter in AI Right Now
Each year we compile our 10 Breakthrough Technologies list, featuring our educated predictions for which technologies will have the biggest impact on how we live

