arXiv:2605.05577v1 Announce Type: cross
Abstract: Recent optimizers such as Lion and Muon have demonstrated strong empirical performance by normalizing gradient momentum via linear minimization oracles (LMOs). While variance reduction has been explored to accelerate LMO-based methods, it typically incurs substantial computational overhead due to additional gradient evaluations. At the same time, the theoretical understanding of LMO-based methods remains fragmented across unconstrained and constrained formulations. Motivated by these limitations, we propose emphLMO-IGT, a new class of stochastic LMO-based methods leveraging implicit gradient transport (IGT). We further introduce a unified framework for stochastic LMO-based optimization together with a new stationarity measure, the emphregularized support function (RSF), which bridges gradient-norm and Frank–Wolfe-gap notions within a common framework. By evaluating stochastic gradients at transported points, LMO-IGT accelerates convergence while retaining the single-gradient-per-iteration structure of standard stochastic LMO. Our analysis establishes that stochastic LMO achieves an iteration complexity of $mathcalO(varepsilon^-4)$, variance-reduced LMO achieves $mathcalO(varepsilon^-3)$ at the cost of additional gradient evaluations, and LMO-IGT achieves $mathcalO(varepsilon^-3.5)$ using only a single stochastic gradient per iteration. Empirically, LMO-IGT consistently improves over stochastic LMO counterparts with negligible overhead. Among its instantiations, Muon-IGT achieves the strongest overall performance across evaluated settings, demonstrating that IGT provides an effective and practical acceleration mechanism for modern LMO-based optimization.
How Chinese short dramas became AI content machines
In a dimly lit bedroom, a frightened young woman is thrown onto a bed by a tall, muscular man. He grabs her hand, and flame-like

