• Home
  • Uncategorized
  • Automated Multitier Tagging of Chinese Online Health Education Resources Using a Large Language Model: Development and Validation Study

Automated Multitier Tagging of Chinese Online Health Education Resources Using a Large Language Model: Development and Validation Study

Background: Precision health promotion, which aims to tailor health messages to individual needs, is hampered by the lack of structured metadata in vast digital health resource libraries. This bottleneck prevents scalable, personalized content delivery and exacerbates information overload for the public. Objective: This study aimed to develop, deploy, and validate an automated tagging system using a large language model (LLM) to create the foundational metadata infrastructure required for tailored health communication at scale. Methods: We developed a comprehensive, 3-tier health promotion taxonomy (10 primary, 34 secondary, and 90,562 tertiary tags) using a hybrid Delphi and corpus-mining methodology. We then constructed a hybrid inference pipeline by fine-tuning a Baichuan2-7B LLM with low-rank adaptation for initial tag generation. This was then refined by a domain-specific named entity recognition model and standardized against a vector database. The system’s performance was evaluated against manual annotations from nonexpert staff on a test set of 1000 resources. We used a “no gold standard” framework, comparing the artificial intelligence–human (A-H) interrater reliability (IRR) with a supplemental human-human (H-H) IRR baseline and expert adjudication for cases where artificial intelligence provided additional tags (“AI Additive”). Results: The A-H agreement was moderate (Cohen κ=0.54, 95% CI 0.53-0.56; Jaccard similarity coefficient=0.48, 95% CI 0.46-0.50). Critically, this was higher than the baseline nonexpert H-H agreement (Cohen κ=0.32, 95% CI 0.29-0.35; Jaccard similarity coefficient=0.35, 95% CI 0.27-0.43). A granular analysis of disagreements revealed that in 15.9% (159/1000) of the cases, the “AI Additive” tags were not identified by human annotators. Expert adjudication of these cases confirmed that the “AI Additive” tags were correct and relevant with a precision of 90% (45/50; 95% CI 78.2%-96.7%). Conclusions: A fine-tuned LLM, integrated into a hybrid pipeline, can function as a powerful augmentation tool for health content annotation. The system’s consistency (A-H κ=0.54) was found to be superior to the baseline human workflow (H-H κ=0.32). By moving beyond simple automation to reliably identify relevant health topics missed by manual annotators with high, expert-validated accuracy, this study provides a robust technical and methodological blueprint for implementing artificial intelligence to enhance precision health communication in public health settings.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844