arXiv:2604.06613v1 Announce Type: cross
Abstract: Modern reasoning models continue generating long after the answer is already determined. Across five model configurations, two families, and three benchmarks, we find that textbf52–88% of chain-of-thought tokens are produced after the answer is recoverable from a partial prefix. This post-commitment generation reveals a structural phenomenon: the textbfdetection–extraction gap. Free continuations from early prefixes recover the correct answer even at 10% of the trace, while forced extraction fails on 42% of these cases. The answer is recoverable from the model state, yet prompt-conditioned decoding fails to extract it. We formalize this mismatch via a total-variation bound between free and forced continuation distributions, yielding quantitative estimates of suffix-induced shift. Exploiting this asymmetry, we propose Black-box Adaptive Early Exit (BAEE), which uses free continuations for both detection and extraction, truncating textbf70–78% of serial generation while textbfimproving accuracy by 1–5,pp across all models. For thinking-mode models, early exit prevents post-commitment overwriting, yielding gains of up to 5.8,pp; a cost-optimized variant achieves 68–73% reduction at a median of 9 API calls. Code is available at https://github.com/EdWangLoDaSc/know2say.
When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don’t
arXiv:2604.06422v1 Announce Type: cross Abstract: Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhere


