Machine learning holds great promise for accelerating enzyme optimization, but its power is fundamentally constrained by the limited availability of sequence-fitness data. Here, we introduce MillionFull, a low-cost method that enables high-throughput full-length sequence-fitness mapping for enzymes of arbitrary length. Each run yields on the order of 10^5 – 10^7; data points, capturing sequence-function relationships at unprecedented scale. By overcoming the data bottleneck, MillionFull provides a foundation for dramatically advancing AI-driven enzyme engineering.
The Hidden Power of Normalization: Exponential Capacity Control in Deep Neural Networks
arXiv:2511.00958v1 Announce Type: cross Abstract: Normalization methods are fundamental components of modern deep neural networks (DNNs). Empirically, they are known to stabilize optimization dynamics and


