• Home
  • DTx
  • Assessment of frontier Large Language Models in sleep medicine

Study objectivesTo evaluate and compare the performance of two proprietary frontier large language models (LLMs), ChatGPT-5 and Grok-4, on diagnostic reasoning and foundational knowledge tasks within the specialty domain of sleep medicine.MethodsThe models were evaluated on two tasks: case-based reasoning using 79 clinical vignettes from the AASM Case Book of Sleep Medicine and knowledge assessment using 897 multiple-choice questions (MCQs) from board review materials. For vignettes, final diagnosis was scored by concept-level exact match, and differential diagnosis (DDx) was scored on a fixed top-5 output using concept-level matching with synonym normalization to compute precision, recall, and F1-score. MCQ performance was the proportion correct. Inter-model performance was compared using the Mann–Whitney U test.ResultsBoth models achieved high accuracy for final diagnosis (92.4% for both; 95% CI 86.4, 98.4) and MCQs (ChatGPT-5: 93.0%; Grok-4: 92.8%). However, performance on generating a comprehensive differential diagnosis was suboptimal, with modest F1-scores for both ChatGPT-5 (0.55 ± 0.20) and Grok-4 (0.59 ± 0.20). There were no statistically significant differences in performance between the two models across any metric (p > 0.05).ConclusionsFrontier LLMs demonstrated high accuracy in sleep medicine tasks requiring knowledge recall and direct pattern recognition but showed more limited performance in complex clinical reasoning tasks such as generating a comprehensive differential diagnosis. These findings suggest that current general-purpose models may be more reliable for focused knowledge support than for broad hypothesis generation. Future studies should evaluate whether domain-adapted models or clinician-in-the-loop workflows can improve real-world diagnostic usefulness and safety.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844