ObjectivesThis review aims to identify the key barriers to clinical application of Machine Learning (ML) in multi-class voice disorder classification.DesignScoping Review.MethodsA comprehensive scoping review of research published between 2013 and May 2025 in seven clinical and engineering databases was conducted. Articles that applied ML techniques to classify voice disorders were examined, excluding publications limited to binary classification (e.g., healthy vs. pathological). Data were extracted from the included articles to analyze patterns in the specific voice disorder classification classes, database selection, the input data attributes, vocal tasks, diagnostic labelling, and the applied ML classification techniques.ResultsIn total, 10,401 articles that addressed voice disorder classification were screened from which 80 used ML techniques for multi-class classification. Results revealed considerable variation in selection of databases, voice disorder diagnostic labels, amount and type of input data (e.g., voice tasks and demographics questionnaire), and classification techniques. These inconsistencies prevent robust comparisons and therefore identification of state-of-the-art solutions, which would typically mature to clinical applications.DiscussionsVariations in classification tasks make it difficult to compare results across studies. The inconsistency found in terms of class imbalance, sample size, and total number of classes investigated, means there is no baseline for comparing and exploring various classification techniques. Finally, variations in testing methods such as using different test set types and sizes or using cross validation limit comparisons across articles.ConclusionsThis review identified considerable variations in the diagnostic labels associated with voice disorder classification, data availability per selected label, and testing methodology. Such variation limits comparability and undermines the generalization of ML models. The lack of consensus across the automated classification pipeline – from selection of which disorders should be classified using ML systems, to constructing test sets and measuring performance – are likely to be critical barriers to clinical application. These barriers must be addressed to realise the potential for using voice as a biomarker of other systemic diseases.
Performance of large language models in delivering accurate and comprehensible patient information on heart failure and cardiomyopathy
BackgroundLarge language models (LLMs) are increasingly used by patients seeking cardiovascular health information through digital platforms. However, their accuracy and suitability for providing guidance on

