arXiv:2605.18820v1 Announce Type: cross
Abstract: Superposition allows Transformers to reason in depth, carrying an entire reasoning frontier in parallel through a bounded-depth forward pass instead of unrolling serial chain-of-thought tokens. While Zhu et al. (2025) hand-crafted an equal-weight breadth-first frontier in a single residual stream for graph reachability, it remained open whether gradient descent could ever find this target amidst permutation-symmetric saddles.
We close this gap on Reachability-by-Superposition over ErdHos-R’enyi graphs by isolating architectural and supervisional contributions. Architecturally, we identify a M”obius attractor: under $S_n$-symmetry in the tree regime, layerwise dynamics reduce to a 1D M”obius map whose zero set is a codimension-one manifold of global optima containing the equal-weight superposition state.
On the supervision side, we identify Cascade Supervision: a loss class whose backward pass simultaneously delivers (A) selectivity bootstrap, (B) gradient persistence across depth, and (C) per-step discrimination (e.g., mathcalL_sup and mathcalL_node). End-to-end supervision fails condition (B) and is provably insufficient: internal gradients at layer c decay as (np)^-(D-c-2)/2 in the graph fan-out and stall before the manifold is reached.
Our thesis: M”obius attractor + Cascade Supervision = emergence of superposition reasoning. The parameter-free decay law predicts a final-step cosine of 0.35 vs. 0.71 (end-to-end vs. cascade) at depth D=3; experiments confirm 0.37 vs. 0.69, matching within 0.02 at every step.
Explainable AI in kidney stone detection and segmentation: a mini review
Kidney stones are one of the most common renal disorders that can produce severe complications if not diagnosed and treated early. Recently, advances in AI