• Home
  • Uncategorized
  • Self-Reported Health Outcomes in Metabolic Health YouTube Comments: Cross-Sectional Study and Rule-Based Natural Language Processing Framework Development and Validation

Background: YouTube is increasingly used for healthcasting, the sharing of evidence-based dietary and lifestyle interventions by domain experts. In the metabolic health domain, channels focused on therapeutic carbohydrate restriction have accumulated audiences of millions. A distinctive feature is the comment section, where viewers share first-person accounts of health changes, constituting a unique source of real-world outcome data at scale. However, extracting structured health information from unstructured comments presents computational challenges. Objective: This observational, cross-sectional study aims to develop and validate a precision-optimized computational framework for extracting self-reported health outcomes from healthcasting YouTube comments and to characterize the prevalence, distribution across health aspects, and channel-level variation of reported outcomes across a large-scale metabolic health corpus. Methods: This study analyzed 43,111 unique YouTube comments from 110 videos across 11 therapeutic carbohydrate restriction-focused healthcasting channels (37,458 unique authors; data span November 2013 to January 2026; collected via YouTube data application programming interface version 3). The methodology comprised 3 construction phases and 5 validation studies. The construction phases were (1) exploratory corpus characterization, (2) iterative development of a 35-aspect hierarchical health outcome ontology, and (3) precision-optimized rule-based classification, validated through precision validation (stratified sample of n=500), recall estimation (n=510), external validation on 5 held-out channels (n=12,653 comments), large language model–assisted interrater reliability assessment, and transformer baseline comparison against Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized BERT Pretraining Approach (ROBERTa) classifiers. A supplementary aspect–based sentiment analysis contextualized the positive-only design. Results: The framework identified 1790 positive health outcome reports (1790/43,111, 4.15% prevalence), achieving 97.6% (488/500) precision (95% CI 95.7%-98.6%) and estimated 56.2% recall (95% CI 43.4%-67.9%). The reports described 6674 positive outcomes, distributed across 35 health aspects and 18 named disease conditions extending beyond weight loss: pain and inflammation reduction (1137/6674, 17%), type 2 diabetes improvement (977/6674, 14.6%), skin health (784/6674, 11.8%), and psychological well-being (731/6674, 11%). Over half (3355/6674, 50.3%) spanned multiple research objectives. Significant channel-level variation was observed (χ²10=927.5; P<.001), with positive outcome rates ranging from 1.32% to 10.40% (odds ratio 8.68, 95% CI 7.10-10.61). Transformer baselines achieved higher recall but lower precision, confirming their advantage for high-confidence corpus generation. A supplementary aspect-based sentiment analysis indicated a positive-to-negative ratio of approximately 4.6:1 (n=1003), with negative experiences (59/495, 11.9%) predominantly involving gastrointestinal and cardiovascular concerns. Conclusions: This study presents, to our knowledge, the first validated, rule-based framework for extracting self-reported metabolic health outcomes from healthcasting YouTube comments at corpus scale. Unlike existing recall-oriented social media health classifiers, the precision-optimized design achieves the confidence threshold required for outcomes research without manual review. These findings demonstrate that expert-led health content comment sections constitute a scalable, complementary data source for monitoring real-world engagement with dietary interventions, with implications for public health surveillance, platform design, and health communication research.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844