• Home
  • Uncategorized
  • Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning

arXiv:2512.15662v3 Announce Type: replace
Abstract: Human beings solve complex problems through critical thinking, where reasoning and evaluation are intertwined to converge toward correct solutions. However, most existing large language models (LLMs) treat the reasoning and verification as separate processes: they either generate reasoning without explicit self-checking or rely on external verifiers to detect errors post hoc. The former lacks immediate feedback, while the latter increases system complexity and hinders synchronized learning. Motivated by human critical thinking, we propose Stepwise Think-Critique (STC), a unified and end-to-end trainable framework that interleaves reasoning and self-critique at every intermediate step within a single model. STC is trained with a hybrid reinforcement learning objective that integrates reasoning rewards and critique-consistency rewards, thereby jointly optimizing solution correctness and reliability of self-evaluation. Experiments on mathematical reasoning benchmarks show that STC demonstrates strong critical-thinking capabilities and produces more interpretable reasoning traces, representing a step toward LLMs with built-in critical thinking.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844