• Home
  • Uncategorized
  • The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

arXiv:2605.18840v2 Announce Type: replace-cross
Abstract: Leaderboards rank frontier models on independent axes but do not reveal whether capabilities reinforce or trade off across releases — and at the frontier, this interaction is the more informative signal. We decompose paired SWE-bench and GPQA Diamond scores into a population coupling trend and per-release residual ($h$-field) that diagnoses capability emphasis from two public benchmark scores. Across 34 models from 10 labs (2024–2026), capabilities cooperate ($r = +0.72$, $p < 10^-6$), but cooperation varies systematically: per-lab coupling slopes span $5times$ (Google $1.15$ vs. DeepSeek $0.23$), and labs pivot — DeepSeek reversed from reasoning-rich to coding-first ($Delta h = 15.9$~pp); Anthropic oscillates between coding excursions and recovery. The population regression serves as an isocline phase boundary: the same $sqrt(a/b)cdot B_1$ classifier that identifies the base-scale coupling transition [Amin, 2026] classifies frontier models and already detects mixed-phase behavior at the next transition (two models below the GPQA–IFEval isocline). The $h$-field is not just diagnostic — it tells you what to change. Pretraining establishes coupling at $0.871$ while RLHF adds $0.081$ [Amin, 2026]: pretraining-level shifts are permanent (DeepSeek’s four-release reversal persists), post-training shifts are reversible (Anthropic’s three coding excursions each recover within one release), and inference compute alone shifts $h$ by $+7.8$~pp without retraining. Knowing which component dominates determines whether to retrain or wait. We provide a three-step diagnostic (locate, classify, predict), a per-lab measurement-priority table, and seven falsifiable predictions with timestamped criteria. Five post-cutoff releases fall within the 95% prediction interval. Code, data, and an interactive dashboard: https://zehenlabs.com/cape/.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844