arXiv:2603.25412v2 Announce Type: replace
Abstract: Large language models increasingly rely on explicit chain-of-thought reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work focuses predominantly on content safety (i.e., detecting harmful, biased, or factually incorrect outputs), while treating the underlying reasoning chain as an opaque intermediate artifact. We argue that reasoning safety constitutes a fundamental security dimension orthogonal to content safety: the requirement that a model’s reasoning trajectory be logically consistent, computationally efficient, and resistant to adversarial manipulation. In this paper, we formalize reasoning safety and introduce a systematic taxonomy of nine unsafe reasoning behaviors. We then conduct a large-scale prevalence study, annotating over 4,000 reasoning chains across benign benchmarks and four state-of-the-art reasoning attacks, empirically demonstrating that all nine error types occur in practice with mechanistically interpretable signatures. To mitigate these threats, we propose the Reasoning Safety Monitor: an external, zero-shot verification framework that runs in parallel with the target LLM. It inspects each reasoning step in real time via a taxonomy-embedded prompt and dispatches an interrupt signal upon detecting unsafe behavior. Extensive evaluations show our monitor achieves up to 87.11% step-level localization accuracy, outperforming hallucination detectors and the best process reward model baselines by a substantial margin. Crucially, the monitor maintains a low false positive rate on correct reasoning paths, operates with negligible latency overhead, and exhibits robust resilience against adaptive adversarial evasion. These findings establish reasoning safety monitoring as a highly feasible and essential component for the secure deployment of large reasoning models.
Digital health tools and point solutions—pitfalls in population health program measurement
Digital health tools are generally poorly regulated and often lack strong research evidence, posing challenges for purchasers of point solutions such as employer groups and


