Bacteriophage anti-CRISPR (Acr) proteins have the potential to reduce off-target effects of genome editing by inactivating the CRISPR-Cas bacterial defense. The current challenge lays in their functional annotation, as Acr proteins have high structural diversity and low sequence similarity, thus rendering common homology-based methods unfit. Recent solutions use deep learning models such as graph convolutional networks that take protein networks as the data input. In an effort to understand whether these new solutions are fit for niche, sparsely annotated proteins, we focus on 3 Acr proteins (AcrIF1, AcrIIA1, and AcrVIA1) as a case study. For each, we create protein contact networks (PCNs) and residue interaction graphs (RIGs) based on existing network theory and methodology. We characterize and analyze these protein networks by comparing how each network architecture affects values of small-worldliness. We reexamine a previous method that focused on using node degree, closeness centralities, and residue solvent accessibility to predict functional residues within a protein via a Jackknife technique. We discuss the implications of the construction of these networks based on how the structure information is acquired. We demonstrate that functional residues within small proteins cannot be reliably predicted with the Jackknife technique, even when provided with a curated dataset containing representative standardized values for degree and closeness centrality. We show that functional residues within these small proteins have low degrees within both PCNs and RIGs, thus making them susceptible to the known degree bias towards high degree nodes present in using graph convolutional networks. We discuss how understanding the data can be used to further improve deep learning approaches for small proteins.
Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding
arXiv:2511.03549v1 Announce Type: cross Abstract: Understanding the purpose of source code is a critical task in software maintenance, onboarding, and modernization. While large language models


