• Home
  • Uncategorized
  • The Viscosity of Logic: Phase Transitions and Hysteresis in DPO Alignment

arXiv:2601.17260v1 Announce Type: cross
Abstract: Direct Preference Optimization (DPO) is often tuned as if increasing alignment pressure (controlled by $beta$) yields progressively “better” behavior. We instead treat $beta$ as a control parameter and densely sweep it for three 7B open-weight families under a fixed DPO recipe. In Mistral, capability is sharply non-monotonic: aggregated logic-probe margins become positive only in a narrow band near $beta approx 10^-2$ and revert outside it, with boundary points that are seed-sensitive. Across architectures under the same sweep, we observe qualitatively different response modes: sharp reorganization in Mistral, selective changes in Llama, and smooth trade-offs in Qwen. Critically, the DPO preference margin can anticorrelate with reasoning capability (Pearson $r=-0.91$ for Llama logic), so margin-based selection can prefer capability-impaired models. Training path also matters: exposure to high $beta$ induces capability losses that persist even after $beta$ is reduced (hysteresis). These findings motivate capability-resolved evaluation across the $beta$ landscape rather than reliance on margins or aggregate benchmarks.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844