• Home
  • Uncategorized
  • Thematic Mapping and Evolution of Social Media Mining in Health Research: Hybrid Bibliometric Synthesis

Background: Social media platforms offer extensive data, as they are widely used globally. Social media mining (SMM) enables real-time monitoring of user-reported health information and serves as a supplement to traditional health data analytics. However, the rapid proliferation of literature has produced fragmentation, and a comprehensive knowledge map regarding SMM is lacking. Further, existing bibliometric reviews in health fields are easily undermined by synonym fragmentation and parameter settings, reducing their robustness. Thus, a more robust, reproducible, and decision-oriented bibliometric framework is required. Objective: This study aimed to (1) outline key thematic clusters in health-related SMM and map their dynamic evolution, and (2) methodologically demonstrate how machine learning–based bibliometric analysis can strengthen the robustness, transparency, and foresight capacity of evidence synthesis. Methods: This study designed a fully automated and reproducible bibliometric analysis of PubMed journal articles published from 2015 to 2025 (n=250) and analyzed records with both abstracts and keywords (n=189). We performed cleaning and standardization for titles, abstracts, author keywords, and MeSH terms, and carried out an exploratory descriptive analysis to obtain preliminary insights into publication patterns. Subsequently, we used SPECTER2 and PubMedBERT embeddings with keywords and abstracts to construct a hybrid similarity matrix. Then, we applied Uniform Manifold Approximation and Projection for dimensionality reduction, followed by Hierarchical Density-Based Spatial Clustering of Applications with Noise for thematic clustering, and visualized the results in a 3D strategic coordinate system (maturity, influence, and recency). We performed intercluster relationship analysis and time-slice analysis to examine thematic intersections and evolution. To ensure robustness and enhance interpretability, we implemented dual-level validation. Results: We identified 6 thematic clusters: cluster 1 (candidate incubator pool of peripheral cross-cutting topics in health-related SMM), cluster 2 (computational methods in health informatics), cluster 3 (public attitudes and sociopsychological determinants), cluster 4 (infodemiology and the COVID-19 information ecosystem), cluster 5 (health communication and public health engagement), and cluster 6 (social media analysis and network methods). Strategic 3D mapping revealed that methodological clusters (clusters 2 and 6) occupied high-maturity and high-influence positions, while application-driven clusters (clusters 3 and 4) occupied high-influence and high-recency positions, representing rapidly expanding frontiers. Clusters 1 and 5 demonstrated strong potential for further growth. Temporal slicing confirmed a trajectory moving from methodological consolidation and thematic diversification to a renewed focus on convergence and problem-solving. Validation showed strong semantic coherence and robustness of the methods and findings. Conclusions: We developed a semantic-structural hybrid bibliometric framework with dual-level validation, reducing synonym fragmentation and parameter sensitivity inherent in traditional approaches. The resulting decision-oriented knowledge map offers strategic guidance for infodemiology-informed and audience-segmented public health communication, research priority settings, and the deployment and evaluation of real-world surveillance and pharmacovigilance workflows while supporting evidence-driven and patient-centered decision-making in public health and health care.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844