Digital resources and interactive multimedia tools for breastfeeding promotion and support: a scoping review

BackgroundBreastfeeding is widely recognized as one of the most cost-effective public health interventions for improving maternal and child health outcomes. Nevertheless, breastfeeding indicators remain suboptimal

Editorial: Advances in generative artificial intelligence for mental health

Post Content

Healthcare professionals’ perspectives on usefulness, acceptability and implementation conditions of socially assistive robots in France: a cross-sectional survey and cluster analysis

IntroductionIn healthcare, socially assistive robots are increasingly used for logistical, assistive, and psychosocial purposes, raising ethical, social, and organizational questions. In these contexts, professionals’ acceptability

Designing and evaluating large language model-enabled clinical decision support for heart failure: a modular and risk-tiered framework

Heart failure (HF) care requires repeated decisions across suspected disease, diagnostic confirmation, phenotyping, guideline-directed medical therapy, device consideration, worsening HF, transition care, and advanced HF

Digital phenotyping of affect and stress in emerging adults

BackgroundDepression is highly heterogeneous and difficult to monitor or predict in daily life. One strategy for monitoring depressive symptoms is digital phenotyping, the real-time tracking

Performance of large language models in delivering accurate and comprehensible patient information on heart failure and cardiomyopathy

June 9, 2026

BackgroundLarge language models (LLMs) are increasingly used by patients seeking cardiovascular health information through digital platforms. However, their accuracy and suitability for providing guidance on heterogeneous diseases such as cardiomyopathies and heart failure remain inadequately evaluated. This study systematically benchmarked state-of-the-art LLMs on patient-oriented heart failure and cardiomyopathy queries regarding clinical appropriateness and comprehensibility.MethodsSix prominent LLM Chatbots were tested on 50 expert-curated questions covering disease understanding and lifestyle advice. A web-based evaluation platform randomized and blinded responses for assessment by twelve reviewers (cardiologists, medical students, AI auto-graders). Responses were rated on a 1–5 Likert scale across nine domains, including appropriateness, readability, and empathy. Reviewers also chose their preferred model per question.ResultsLinguistic complexity and output length varied substantially. Gemini provided the most readable responses (Flesch–Kincaid Grade 11.3 ± 1.9) but was among the most verbose (668.7 ± 116.1 words). Across 2,700 ratings, Gemini received the highest composite mean rating (4.41 ± 0.77), excelling in completeness and factual reliability, followed by Grok (4.23 ± 0.76). Confabulation avoidance scored consistently high across all models (4.49 ± 0.02), while conciseness scored lowest (3.81 ± 0.05). Consistently, evaluators selected Gemini as their preferred information source in 43.7%, followed by Grok (30.3%). Rating tendencies varied by evaluator group: Auto-graders gave the highest average scores (mean 4.58 ± 0.60), followed by students (4.10 ± 0.88), while experts were more conservative (3.79 ± 0.93).DiscussionAll LLMs showed good accuracy avoiding medical misinformation, though variability exists in readability and comprehensiveness. While major factual errors or hallucinations were rare in our blinded evaluation, they were not entirely absent.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844