• Home
  • AI/ML & Advanced Analytics
  • Integrating protein sequence embeddings with structure via graph-based deep learning for single-residue property prediction

Integrating protein sequence embeddings with structure via graph-based deep learning for single-residue property prediction

arXiv:2502.17294v2 Announce Type: replace
Abstract: Understanding the intertwined contributions of amino acid sequence and spatial structure is essential to explain protein behaviour. Here, we introduce INFUSSE (Integrated Network Framework Unifying Structure and Sequence Embeddings), a deep learning framework for the prediction of single-residue properties that combines fine-tuning of sequence embeddings derived from a Large Language Model with the inclusion of graph-based representations of protein structures via a diffusive Graph Convolutional Network. To illustrate the benefits of jointly leveraging sequence and structure, we apply INFUSSE to the prediction of B-factors in antibodies, a residue property that reflects the local flexibility shaped by biochemical and structural constraints in these highly variable and dynamic proteins. Using a dataset of 1510 antibody and antibody-antigen complexes from the database SAbDab, we show that INFUSSE improves performance over current machine learning (ML) methods based on sequence or structure alone, and allows for the systematic disentanglement of sequence and structure contributions to the performance. Our results show that adding structural information via geometric graphs enhances predictions especially for intrinsically disordered regions, protein-protein interaction sites, and highly variable amino acid positions — all key structural features for antibody function which are not well captured by purely sequence-based ML descriptions.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844