• Home
  • Uncategorized
  • Critical Appraisal Tools for Evaluating Artificial Intelligence in Clinical Studies: Scoping Review

Critical Appraisal Tools for Evaluating Artificial Intelligence in Clinical Studies: Scoping Review

Background: Health research that uses predictive and/or generative AI is rapidly growing. Just as in traditional clinical studies, the way in which AI studies are conducted can introduce systematic errors. Transmission of this AI evidence into clinical practice and research needs critical appraisal tools for clinical decision makers and researchers. Objective: To identify existing tools for critical appraisal of clinical studies that use artificial intelligence (AI) and examine the concepts and domains these tools explore. Methods: Inclusion criteria in PCC framework P: (population) Artificial intelligence clinical studies. C (Concept): tools for critical appraisal and associated constructs such as: quality, reporting, validity, risk of bias, and applicability. C (context): in clinical practice context. In addition, bias classification and Chatbot assessment studies were included. We searched in medical and engineering databases (MEDLINE, EMBASE, CINAHL, PsycINFO and IEEE). We included clinical primary research with tools for critical appraisal. Classic reviews and systematic reviews were included in first phase of screening. They were excluded in the secondary phase, after identifying new tools by forward snowballing. We excluded non-human, computer and mathematical research, and letters, opinion papers and editorials. We used Rayyan for screening. Data extraction was done by two observers and discrepancies were solved by discussion. The protocol was previously registered in OSF (https://doi.org/10.17605/OSF.IO/ETYDS). We adhered to the PRISMA extension for Scoping reviews and to the PRISMA-Search extension for Reporting Literature in Systematic Reviews. Results: We retrieved 4393 unique records for screening. After excluding 3803 records, 119 were selected for full-text screening. From these, 59 were excluded. After inclusion of 10 studies via other methods, a total of 70 records were finally included. 46 of them were reporting guidelines (15 tools for critical appraisal, 2 for quality of study and 2 for risk of bias). Nine papers ware focused on bias classification or mitigation. We found 15 Chatbots assessment studies or systematic reviews of Chatbots studies (6 and 9 respectively) which are a very heterogeneous group. Conclusions: The results picture a landscape of the evidence tools where reporting tools predominate, followed by critical appraisal and risk of bias tools, and few tools for risk of bias. The mismatch of bias in AI and epidemiology should be considered for critical appraisal, especially regarding fairness and the mitigation bias in the AI. Finally, Chatbot assessment studies is a vast and evolving field in which progress in design, reporting and critical appraisal is necessary and urgent. Clinical Trial: https://doi.org/10.17605/OSF.IO/ETYDS

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844