arXiv:2507.01028v2 Announce Type: replace-cross
Abstract: The em stop gradient and em exponential moving average iterative procedures are commonly used in non-contrastive approaches to self-supervised learning to avoid representation collapse, with excellent performance in downstream applications in practice. This presentation investigates these procedures from the dual viewpoints of optimization and dynamical systems. We show that, in general, although they em do not optimize the original objective, or em any other smooth function, they em do avoid collapse Following~citetTian21, but without any of the extra assumptions used in their proofs, we then show using a dynamical system perspective that, in the linear case, minimizing the original objective function without the use of a stop gradient or exponential moving average em always leads to collapse. Conversely, we characterize explicitly the equilibria of the dynamical systems associated with these two procedures in this linear setting as algebraic varieties in their parameter space, and show that they are, in general, em asymptotically stable. Our theoretical findings are illustrated by empirical experiments with real and synthetic data.
The one piece of data that could actually shed light on your job and AI
This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Within Silicon


