arXiv:2512.07692v2 Announce Type: replace-cross
Abstract: Coarse-grained (CG) modeling enables molecular simulations to reach time and length scales inaccessible to fully atomistic methods. For classical CG models, the choice of mapping, that is, how atoms are grouped into CG sites, is a major determinant of accuracy and transferability. At the same time, the emergence of machine learning potentials (MLPs) offers new opportunities to build CG models that can in principle learn the true potential of the mean force for any mapping. In this work, we systematically investigate how the choice of mapping influences the representations learned by equivariant MLPs by studying liquid hexane, amino acids, and polyalanine. We find that when the length scales of bonded and nonbonded interactions overlap, unphysical bond permutations can occur. We also demonstrate that correctly encoding species and maintaining stereochemistry are crucial, as neglecting either introduces unphysical symmetries. Our findings provide practical guidance for selecting CG mappings compatible with modern architectures and guide the development of transferable CG models.


