Phylogenetic trees play a fundamental role in elucidating evolutionary relationships among taxa. Clustering taxa remains a major challenge across diverse biological domains such as cancer genomics, microbial systematics, and phylogenomics. Several methods partition taxa in phylogenetic trees into clusters, but these approaches face key limitations. Many rely on arbitrary distance thresholds that are contingent on a-priori information. These thresholds can be hard to ascertain, are subject to bias, and may vary across studies–hindering meaningful interpretation and comparison. More broadly, many methods lack rigorous definitions of what constitutes a cluster, and depend on heuristics that restrict the cluster search space since enumerating all possible clusters is computationally infeasible for large trees. Here, we present PhytClust, a threshold-free algorithm that partitions taxa in phylogenetic trees into monophyletic clusters. PhytClust provides an exact solution to the problem of finding an optimal set of clusters in a phylogenetic tree that minimizes the total intra-cluster branch lengths for a given number of clusters. It then determines the optimal number of clusters using a validity index. PhytClust yields an exact and efficient clustering solution that reflects the tree’s topology and genetic distances. In simulated datasets, PhytClust outperforms existing methods in both speed and accuracy and scales to trees with more than a hundred thousand taxa. We apply PhytClust across cancer genomics, avian phylogenomics, bacterial and archaea phylogenetics, and plant genomics to demonstrate PhytClust’s varied applicability. By providing a standardized method for taxa clustering within phylogenetic trees, PhytClust yields reproducible, optimal and computationally efficient clusters.
Scaling Causal Mediation for Complex Systems: A Framework for Root Cause Analysis
arXiv:2512.14764v1 Announce Type: cross Abstract: Modern operational systems ranging from logistics and cloud infrastructure to industrial IoT, are governed by complex, interdependent processes. Understanding how



