As whole-genome and whole-exome datasets increase in size, they uncover alleles at lower and lower frequencies in the population. Samples of rare alleles often include recurrent mutations, where derived alleles are identical by state and not by descent. As a result, the site frequency spectrum (SFS) becomes challenging to analyze because it is strongly dependent on the mutation rate. To overcome this hurdle, we define the single mutation frequency spectrum (SMFS), which is the frequency spectrum of alleles descendant from a single mutational event. For rare alleles, the SFS with recurrent mutation is then a weighted sum of the convolutions of the SMFS with itself. This simple, yet powerful, model decouples recurrent mutation from the population genetic processes giving rise to the SMFS, such as genetic drift and selection. We show how both forward-in-time and backward-in-time models with recurrent mutations can be recast in terms of the SMFS. We then develop a method for combinatorial hierarchic estimation of the SMFS (which we name CHES). We apply this simple, yet robust, method to a human exome sequencing dataset to show that the SMFS with recurrent mutation can account for SFS differences between low and high mutation rate sites. The inferred SMFS shows an approximate scaling law with allele frequencies inconsistent with both a constant population size and an exponentially growing population model.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844