Accurately predicting gene expression from DNA sequence remains a central challenge in human genetics. Current sequence-based models overlook natural genetic variation across individuals, while population-based models are restricted to variants observed within specific cohorts. Here, we present VariantFormer, a 1.2-billion-parameter transformer that predicts gene-level RNA abundance directly from personalized diploid genomes. Trained on 21,004 genome–transcriptome pairs from 2,330 donors, VariantFormer achieves state-of-the-art performance across both sequence- and population-based prediction tasks, while generalizing better to out-of-distribution contexts–including somatic mutation settings in cancer cell lines–and maintaining robustness across ancestries. Beyond expression prediction, VariantFormer improves eQTL effect size estimation compared to prior methods, with notable gains for lower-frequency and ancestry-specific variants. In applications to Alzheimer’s disease, VariantFormer gene embeddings prioritize likely causal genes and relevant tissue contexts, and in silico mutagenesis of known APOE alleles faithfully recovers known risk modifying effects. Together, these results establish VariantFormer as a scalable, diploid-aware framework for variant interpretation and personalized gene expression modeling across tissues and populations.
The Hidden Power of Normalization: Exponential Capacity Control in Deep Neural Networks
arXiv:2511.00958v1 Announce Type: cross Abstract: Normalization methods are fundamental components of modern deep neural networks (DNNs). Empirically, they are known to stabilize optimization dynamics and


