arXiv:2602.23696v3 Announce Type: replace-cross
Abstract: We analyze cumulative parameter trajectories of transformer training under AdamW and identify a dominant low-dimensional drift direction (“backbone”) that captures 60–80% of long-horizon displacement from initialization. This direction is highly stable over rolling training windows yet reorients gradually across phases, particularly following objective reweighting. Per-batch gradients exhibit near-noise-floor alignment with the backbone, whereas optimizer-integrated updates align strongly with it, indicating that the structure emerges from accumulated optimizer dynamics rather than instantaneous gradient geometry.
Replacing AdamW with SGD-family optimizers eliminates this structure, and reducing $beta_2$ smoothly degrades backbone dominance and reheating recoverability. Reheating experiments show that transverse probe modes can be transiently re-excited without substantially altering accumulated backbone drift.
These results provide a trajectory-level characterization of optimizer-induced geometric structure in transformer training and shift attention from instantaneous gradient properties to cumulative update dynamics.
Using an Adult-Designed Wearable for Pediatric Monitoring: Practical Tutorial and Application in School-Aged Children With Obesity
This tutorial presents a step-by-step guide on how to use an adult-oriented wearable (Fitbit) to collect and analyze activity and cardiovascular data in a pediatric



