Background: Cataracts are an eye condition characterized by high prevalence and blindness-inducing potential, and effective approaches are required for their early diagnosis, underscoring the clinical significance of this study. Objective: This study aims to evaluate the performance of deep learning (DL) in cataract diagnosis and assess its potential as an effective tool for automated diagnosis, and compare the diagnostic accuracy of DL versus both machine learning and human experts. Methods: A systematic search was conducted in Web of Science, Embase, IEEE Xplore, PubMed, and Cochrane Library until April 1, 2025, for studies on image-based DL for cataract detection or clinical subtype classification. The included studies were assessed for the risk of bias (RoB) using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). Bivariate mixed effects models were used for data analyses, and publication bias was assessed by Deeks’ funnel plots. Results: Sixty-three studies were finally included. The quality assessment indicated a high or unclear RoB in the patient selection (34 studies) and index test (44 studies) domains. Meanwhile, in the reference standard domain, the risk of bias was high or unclear in only 2 studies. Image-based DL achieved a sensitivity of 96% (95% CI 0.95‐0.97) and a specificity of 98% (0.96‐0.98) for cataract detection, with an area under the ROC curve (AUC) of 0.99 (0.98‐1.00). For cataract classification, the sensitivity and specificity of image-based DL were 94% (0.93‐0.96) and 97% (0.96‐0.98), respectively, with an AUC of 0.99 (0.98‐0.99). Despite the strong overall performance, the model’s generalization capability was challenged by its lower performance observed on independent external datasets (detection: sensitivity 87%, specificity 93%; classification: sensitivity 89%, specificity 90%), potentially attributable to domain shift between the training and validation data. Conclusions: Image-based DL has demonstrated high precision in the detection and classification of cataracts, showing potential advantages over traditional machine learning methods, though validation remains limited. Its performance falls within the range of reported accuracy of human experts, highlighting the high feasibility of automated diagnosis. However, validation data limitations, coupled with moderate-quality evidence and high heterogeneity, constrain the utility of DL in auxiliary diagnosis. The model’s sensitivity dropped to 87% in external validation, restricting its generalization capability, so caution should be exercised in broad clinical implementation.
Behavior change beyond intervention: an activity-theoretical perspective on human-centered design of personal health technology
IntroductionModern personal technologies, such as smartphone apps with artificial intelligence (AI) capabilities, have a significant potential for helping people make necessary changes in their behavior

