Count data are ubiquitous across many applications in which understanding hidden patterns, or latent structure, is of interest. Topic modeling is a powerful tool for detecting latent structure in count data. However, standard topic modeling methods are often constrained by their restrictive assumptions, susceptible to noise, and sensitive to misspecification of the number of topics, which is particularly of concern when analyzing non-text data. Here, we introduce SEEK-VEC (Spectral Ensembling of topic models with Eigenscore for K-agnostic Vocabulary Embedding and Classification), an ensemble framework for count data that integrates insights from multiple candidate topic models through a spectral ensembling procedure. This approach automatically reinforces signal and mitigates noise to generate a consensus low-dimensional embedding of the data. SEEK-VEC produces prioritization scores and grouping scores that enable variable classification, interactive pattern discovery, and model diagnostics. Through simulations, we demonstrate that SEEK-VEC is robust under realistic settings and outperforms state-of-the-art oracle methods, particularly when signal strength is weak. Applied to diverse real-world datasets, including self-reported psychopathology symptom data, food preference questionnaires, and single-cell transcriptomics, SEEK-VEC reveals latent structures that provide scientifically meaningful insights.
Scaling Causal Mediation for Complex Systems: A Framework for Root Cause Analysis
arXiv:2512.14764v1 Announce Type: cross Abstract: Modern operational systems ranging from logistics and cloud infrastructure to industrial IoT, are governed by complex, interdependent processes. Understanding how



