arXiv:2603.06768v2 Announce Type: replace
Abstract: Genotype-to-phenotype prediction is a central goal of statistical genetics, yet practical comparisons of prediction workflows remain limited in small, heterogeneous, participant-shared genomic datasets. Here, we benchmarked end-to-end case-control prediction across 80 curated binary phenotypes from openSNP using machine learning, deep learning, and polygenic score workflows. We evaluated 29 machine-learning algorithms, 80 deep-learning model variants, and 3 polygenic score tools across 675 clumping and pruning configurations. No workflow family dominated universally. Polygenic score workflows achieved the highest observed discrimination for 53 phenotypes, whereas machine-learning or deep-learning workflows achieved the highest for 27. However, many apparent phenotype-level wins were modest, with 41.2% of comparisons representing practical ties within five discrimination points. Performance was strongly phenotype-dependent and sensitive to modeling and preprocessing choices. Distinct workflow-specific failure modes were also observed, including unstable behaviour in PRSice and non-informative collapse in lassosum for 13 phenotypes. Higher peak performance was concentrated in smaller phenotypes, reinforcing the need for cautious interpretation in limited-data settings. The cohort was predominantly of European ancestry, restricting generalisability. Together, these results position openSNP as a useful stress-test environment for genomic prediction and support benchmark-guided workflow selection under realistic conditions of data scarcity, phenotype heterogeneity, and ancestry imbalance.
Crisis support teams’ technological openness and learning attitudes toward the AI based virtual patient system crisis support VR
BackgroundAgainst the backdrop of escalating global humanitarian crises, innovative didactic simulations are becoming increasingly important. A promising alternative to traditional classroom-based didactics for learning psychological