arXiv:2604.06613v1 Announce Type: cross
Abstract: Modern reasoning models continue generating long after the answer is already determined. Across five model configurations, two families, and three benchmarks, we find that textbf52–88% of chain-of-thought tokens are produced after the answer is recoverable from a partial prefix. This post-commitment generation reveals a structural phenomenon: the textbfdetection–extraction gap. Free continuations from early prefixes recover the correct answer even at 10% of the trace, while forced extraction fails on 42% of these cases. The answer is recoverable from the model state, yet prompt-conditioned decoding fails to extract it. We formalize this mismatch via a total-variation bound between free and forced continuation distributions, yielding quantitative estimates of suffix-induced shift. Exploiting this asymmetry, we propose Black-box Adaptive Early Exit (BAEE), which uses free continuations for both detection and extraction, truncating textbf70–78% of serial generation while textbfimproving accuracy by 1–5,pp across all models. For thinking-mode models, early exit prevents post-commitment overwriting, yielding gains of up to 5.8,pp; a cost-optimized variant achieves 68–73% reduction at a median of 9 API calls. Code is available at https://github.com/EdWangLoDaSc/know2say.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844