Health-care AI is here. We don’t know if it actually helps patients.

I don’t need to tell you that AI is everywhere. Or that it is being used, increasingly, in hospitals. Doctors are using AI to help them

A Systematic Review and Taxonomy of Reinforcement Learning-Model Predictive Control Integration for Linear Systems

arXiv:2604.21030v1 Announce Type: cross Abstract: The integration of Model Predictive Control (MPC) and Reinforcement Learning (RL) has emerged as a promising paradigm for constrained decision-making

Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework

arXiv:2604.21090v1 Announce Type: cross Abstract: AI governance programmes increasingly rely on natural language prompts to constrain and direct AI agent behaviour. These prompts function as

The Path Not Taken: Duality in Reasoning about Program Execution

arXiv:2604.20917v1 Announce Type: cross Abstract: Large language models (LLMs) have shown remarkable capabilities across diverse coding tasks. However, their adoption requires a true understanding of

Watts-per-Intelligence Part II: Algorithmic Catalysis

arXiv:2604.20897v1 Announce Type: cross Abstract: We develop a thermodynamic theory of algorithmic catalysis within the watts-per-intelligence framework, identifying reusable computational structures that reduce irreversible operations

Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality

April 17, 2026

arXiv:2604.14419v1 Announce Type: new
Abstract: Sparse Mixture-of-Experts (MoE) architectures employ increasingly sophisticated routing mechanisms — learned routers, multi-hop trajectories, token-dependent gating. We ask: does routing topology actually determine language modeling quality? We build a geometric MoE (ST-MoE) using cosine-similarity routing against learned centroids in a low-dimensional space ($d_space = 64$), requiring 80% fewer routing parameters than standard linear routers. Through 62 controlled experiments on WikiText-103 at 76–84M parameters trained to convergence (50K steps, 1.64B tokens), we find that routing topology does not determine asymptotic perplexity (PPL): five cosine-routing variants are statistically equivalent within a 1-PPL margin (Two One-Sided Tests [TOST], $p < 0.05$ for all 10 pairwise comparisons; 15 runs across 3 seeds, observed range 33.93–34.72). The finding extends to hash, random-fixed, and top-1 routing (single-seed; graceful 1.1–2.2 PPL degradation) and replicates on OpenWebText (0.03 PPL gap, 6 runs, 3 seeds each). A standard linear router with 5.3$times$ more routing parameters reaches PPL 32.76, but iso-parameter cosine routing closes 67% of this gap — the true mechanism advantage is $sim$1.2%. The mechanistic explanation is convergent redundancy: multi-hop updates are collinear ($cos(Delta h_0, Delta h_1) = 0.805$), implementing magnitude amplification rather than compositional reasoning; a single learnable scalar replicates multi-hop performance. As a practical payoff, zero-shot relative-norm halting saves 25% of MoE FLOPs at +0.12% PPL. Expert-level specialization and causal controllability — which coexist with topology-level equifinality — are explored in a companion paper.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844