Dissociable contributions of cortical thickness and surface area to cognitive ageing: evidence from multiple longitudinal cohorts.

Cortical volume, a widely-used marker of brain ageing, is the product of two genetically and developmentally dissociable morphometric features: thickness and area. However, it remains

Animal collocation revisited: intercohort comparison and a case study comparing call combinations between sexes in common marmosets

Many animals communicate using sequences of signals, but identifying recurrent, non-random signal combinations remains methodologically challenging. Collocation analyses are increasingly popular approaches for detecting which

Helicase: Vectorized parsing and bitpacking of genomic sequences

Modern sequencing pipelines routinely produce billions of reads, yet the dominant storage formats (FASTQ and FASTA) are text-based and sequential, making high-throughput parsing a persistent

Ineffectual Genomic Error Correction Under Environmental Perturbation Dynamically Regulates Mutational Supply and Robustness

Adaptive evolution depends on the supply of heritable variation, yet excessive mutation threatens viability by degrading essential molecular functions. Here, we show that this trade-off

aaKomp: Alignment-free amino acid k-mer matching for genome completeness assessment at scale

In de novo sequencing projects, genome assembly optimization requires evaluating a number of candidate assemblies to identify optimal tool parameters. Yet, current completeness assessment tools

FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels

March 10, 2026

arXiv:2511.02872v4 Announce Type: replace-cross
Abstract: Recent advances in large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, particularly on contest-based mathematical benchmarks like the IMO. However, these contests do not reflect the depth, breadth, and abstraction of modern mathematical research. To bridge this gap, we introduce FATE (Formal Algebra Theorem Evaluation), a new benchmark series in formal algebra designed to chart a course toward advanced mathematical reasoning. We present two new components, FATE-H and FATE-X, each with 100 problems in abstract and commutative algebra. The FATE series spans a difficulty spectrum from undergraduate exercises to problems exceeding PhD qualifying exams. Notably, FATE-X is the first formal benchmark to surpass both PhD-level exam difficulty and the coverage of the Mathlib library. Our evaluations of state-of-the-art LLM provers on this new benchmark reveal a stark performance gap compared to contest math: the best model achieves only 3% (pass@64) accuracy on FATE-H and 0% on FATE-X. Our two-stage evaluation reveals that models’ natural-language reasoning is notably more accurate than their ability to formalize this reasoning. We systematically classify the common errors that arise during this formalization process. Furthermore, a comparative study shows that a specialized prover can exhibit less effective reflection than general-purpose models, reducing its accuracy at the natural-language stage. We believe FATE provides a robust and challenging benchmark that establishes essential checkpoints on the path toward research-level formal mathematical reasoning.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844