Study objectivesTo evaluate and compare the performance of two proprietary frontier large language models (LLMs), ChatGPT-5 and Grok-4, on diagnostic reasoning and foundational knowledge tasks within the specialty domain of sleep medicine.MethodsThe models were evaluated on two tasks: case-based reasoning using 79 clinical vignettes from the AASM Case Book of Sleep Medicine and knowledge assessment using 897 multiple-choice questions (MCQs) from board review materials. For vignettes, final diagnosis was scored by concept-level exact match, and differential diagnosis (DDx) was scored on a fixed top-5 output using concept-level matching with synonym normalization to compute precision, recall, and F1-score. MCQ performance was the proportion correct. Inter-model performance was compared using the Mann–Whitney U test.ResultsBoth models achieved high accuracy for final diagnosis (92.4% for both; 95% CI 86.4, 98.4) and MCQs (ChatGPT-5: 93.0%; Grok-4: 92.8%). However, performance on generating a comprehensive differential diagnosis was suboptimal, with modest F1-scores for both ChatGPT-5 (0.55 ± 0.20) and Grok-4 (0.59 ± 0.20). There were no statistically significant differences in performance between the two models across any metric (p > 0.05).ConclusionsFrontier LLMs demonstrated high accuracy in sleep medicine tasks requiring knowledge recall and direct pattern recognition but showed more limited performance in complex clinical reasoning tasks such as generating a comprehensive differential diagnosis. These findings suggest that current general-purpose models may be more reliable for focused knowledge support than for broad hypothesis generation. Future studies should evaluate whether domain-adapted models or clinician-in-the-loop workflows can improve real-world diagnostic usefulness and safety.
Promotion and preservation of mobility and autonomy in old age through smart rollators—a qualitative study
BackgroundDiseases and health limitations associated with ageing often result in loss of mobility and reduced social participation. The ongoing demographic shift towards an increasingly ageing



