arXiv:2604.05134v1 Announce Type: cross
Abstract: How can you get a language model to reason in a task it natively struggles with? We study how reasoning evolves in a language model — from supervised fine-tuning (SFT) to reinforcement learning (RL) — by analyzing how a set of theoretically-inspired datasets impacts language model performance in chess. We find that fine-tuning a model to directly predict the best move leads to effective RL and the strongest downstream performance — however, the RL step elicits unfaithful reasoning (reasoning inconsistent with the chosen move). Alternatively, training on multi-move trajectories yields comparable downstream performance with faithful reasoning and more stable RL. We show that RL induces a substantial positive shift in the distribution of move quality and reduces hallucination rates as a side effect. Finally, we find several SFT-checkpoint metrics — metrics spanning evaluation performance, hallucination rates, and reasoning quality — to be predictive of post-RL model performance. We release checkpoints and final models as well as training data, evaluations, and code which allowed us to surpass leading open-source reasoning models in chess with a 7B-parameter model.
When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don’t
arXiv:2604.06422v1 Announce Type: cross Abstract: Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhere

