• Home
  • Uncategorized
  • Modular Deep Learning for Direct RNA Sequence Design via Self-Contained RNA Units

RNA sequence design is a pivotal challenge in synthetic biology, yet state-of-the-art deep learning methods face a fundamental bottleneck: the scarcity of high-resolution 3D structures. To compensate for limited training data, existing approaches like NA-MPNN and RiboDiffusion employ computationally expensive autoregressive or iterative diffusion sampling, substantially limiting their throughput and scalability. In this work, we propose that this data limitation is largely a problem of accessibility and granularity. We introduce SCRU-DB, a comprehensive database that systematically decomposes complex RNAs into over 61,000 Self-contained RNA Units (SCRUs). This scale far exceeds previous RNA motif libraries, capturing over 8,200 unique structural clusters. Crucially, SCRUs are rigorously defined as structurally autonomous modules identified via tertiary contact clustering, ensuring they act as self-stabilizing, foldable physical units. Leveraging this massive, modular prior, we present SCRU-Seq (a direct, O(1) prediction GNN) and SCRU-Diff (an iterative diffusion model). On our high-fidelity set112 benchmark, SCRU-Seq achieves a native sequence recovery (NSR) of 63.7%, while SCRU-Diff reaches a superior Best NSR of 79.2%. We demonstrate high structural fidelity via 3D backbone superposition using the C4′ RMSD (reaching 1.5 angstrom for complex targets) and validate the structural isomorphism of our modular fragments. This framework provides a scalable, physically grounded solution for generating diverse and structurally accurate RNA sequences.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844