Translating AI research into reality: summary of the 2025 voice AI Symposium and Hackathon

The 2025 Voice AI Symposium represented a transition from conceptual research to clinical implementation in vocal biomarker science. Hosted by the NIH-funded Bridge2AI-Voice consortium, the

Toward terminological clarity in digital biomarker research

Digital biomarker research has generated thousands of publications demonstrating associations between sensor-derived measures and clinical conditions, yet clinical adoption remains negligible. We identify a foundational

Trust and anxiety as primary drivers of digital health acceptance in multiple sclerosis: toward an extended disease-specific technology acceptance model

BackgroundDigital health applications and AI-supported wearables may benefit people with Multiple Sclerosis (MS), yet fluctuating cognitive and physical symptoms could shape adoption in ways not

Real-world federated learning for brain imaging scientists

BackgroundFederated learning (FL) has the potential to boost deep learning in neuroimaging but is rarely deployed in real-world scenarios, where its true potential lies. We

Through the looking glass: ethical considerations regarding LLM-induced hallucinations to medical questions

Post Content

Hilbert: Recursively Building Formal Proofs with Informal Reasoning

March 18, 2026

arXiv:2509.22819v2 Announce Type: replace
Abstract: Large Language Models (LLMs) demonstrate impressive mathematical reasoning abilities, but their solutions frequently contain errors that cannot be automatically checked. Formal theorem proving systems such as Lean 4 offer automated verification with complete accuracy, motivating recent efforts to build specialized prover LLMs that generate verifiable proofs in formal languages. However, a significant gap remains: current prover LLMs solve substantially fewer problems than general-purpose LLMs operating in natural language. We introduce Hilbert, an agentic framework that bridges this gap by combining the complementary strengths of informal reasoning and formal verification. Our system orchestrates four components: an informal LLM that excels at mathematical reasoning, a specialized prover LLM optimized for Lean 4 tactics, a formal verifier, and a semantic theorem retriever. Given a problem that the prover is unable to solve, Hilbert employs recursive decomposition to split the problem into subgoals that it solves with the prover or reasoner LLM. It leverages verifier feedback to refine incorrect proofs as necessary. Experimental results demonstrate that Hilbert substantially outperforms existing approaches on key benchmarks, achieving 99.2% on miniF2F, 6.6% points above the best publicly available method. Hilbert achieves the textbfstrongest known result from a publicly available model on PutnamBench. It solves 462/660 problems (70.0%), outperforming proprietary approaches like SeedProver (50.4%) and achieving a 422% improvement over the best publicly available baseline. Thus, Hilbert effectively narrows the gap between informal reasoning and formal proof generation. Code is available at https://github.com/Rose-STL-Lab/ml-hilbert.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844