arXiv:2604.07848v2 Announce Type: replace-cross
Abstract: Multi-task learning shows strikingly inconsistent results — sometimes joint training helps substantially, sometimes it actively harms performance — yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple datasets achieves strong correlations and recovers biological pathway organization. Standard benchmarks systematically violate this requirement — MoleculeNet operates at <5% overlap, TDC at 8-14% — far below the threshold where gradient analysis becomes meaningful. This provides the first principled explanation for seven years of inconsistent MTL results.
Wavelet analysis of human recombination rates demonstrates divergence on fine scales
Background: Recombination rates can be estimated across the genome, underpinning genetic analyses such as identification of regions under selection. Accurate recombination mapping requires observing a

