LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

arXiv:2603.19312v1 Announce Type: cross Abstract: Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods

Exploring Subnetwork Interactions in Heterogeneous Brain Network via Prior-Informed Graph Learning

arXiv:2603.19307v1 Announce Type: cross Abstract: Modeling the complex interactions among functional subnetworks is crucial for the diagnosis of mental disorders and the identification of functional

HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

arXiv:2603.19957v1 Announce Type: cross Abstract: Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical

Auditing Google’s AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy

arXiv:2511.12920v3 Announce Type: replace-cross Abstract: Google Search increasingly surfaces AI-generated content through features like AI Overviews (AIO) and Featured Snippets (FS), which users frequently rely

Branched Optimal Transport for Stimulus to Reaction Brain Mapping

arXiv:2603.19751v1 Announce Type: cross Abstract: A central problem in systems neuroscience is to determine how an external stimulation is propagated through the brain so as

Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions

March 23, 2026

arXiv:2603.19335v1 Announce Type: cross
Abstract: Post-training alignment has produced dozens of competing algorithms — DPO, SimPO, KTO, GRPO, and others — yet practitioners lack controlled comparisons to guide algorithm selection. We present OXRL, a unified framework implementing 51 post-training algorithms with identical infrastructure, enabling the first large-scale apples-to-apples evaluation. Our study spans 8 algorithms across 4 model scales (0.5B–7B), 3 evaluation domains, and a 20-variant DPO taxonomy (100 runs at 1.5B, 5 seeds each), totaling $sim$240 training runs on H100 GPUs. Three headline findings emerge. (1)~Algorithm rankings are unstable across scale: at 1.5B, online RL (SGRPO) tops all methods at 58.0%~$pm$0.57 on GSM8K; by 7B, the worst small-scale method (SimPO) becomes the best (85.8%), a complete ranking inversion driven by model scale rather than LoRA regularization (confirmed via 2$times$2 factorial). (2)~Loss function modifications yield negligible gains: none of 20 DPO variants significantly outperform vanilla DPO after Bonferroni correction; the sole significant outlier, SimPO, is worse ($-$11.5~pp, $p < 10^-4$). (3)~Algorithm leverage is task-specific: the 19.3~pp GSM8K spread collapses to 0.54~pp on MATH ($36times$) and 0.47~pp on general-domain benchmarks ($41times$), confirming that algorithm choice matters primarily within the training distribution. These findings yield a hierarchy of leverage for practitioners: model scale ($sim$50~pp) $gg$ training paradigm ($sim$10~pp) $gg$ online vs. offline ($sim$9~pp) $gg$ loss function ($sim$1~pp). We release all code, configs, and evaluation data as a living community benchmark.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844