Unlocking electronic health records: a hybrid graph RAG approach to safe clinical AI for patient QA

IntroductionElectronic health record (EHR) systems present clinicians with vast repositories of clinical information, creating a significant cognitive burden where critical details are easily overlooked. While

Virtual reality in treatment of psychological disorders: a systematic review

ObjectiveThe paper aims to systematically review the literature on the efficacy of virtual reality (VR) based therapies to treat mental health disorders in Randomized Control

Real-world federated learning for brain imaging scientists

BackgroundFederated learning (FL) has the potential to boost deep learning in neuroimaging but is rarely deployed in real-world scenarios, where its true potential lies. We

Through the looking glass: ethical considerations regarding LLM-induced hallucinations to medical questions

Post Content

How physicians embrace AI: insights from technology acceptance and trust theories

ObjectiveThis study investigates the factors influencing physicians’ acceptance and adoption of artificial intelligence (AI) technologies in clinical practice, integrating the Theory of Planned Behavior (TPB)

Disclaimers and Referral Patterns for Medical Advice Across Urgency Levels: Large Language Model Evaluation Study

March 16, 2026

Background: “I’m not a doctor, but…” is a typical response when asking considerate laypeople for health advice. However, seeking medical advice has also shifted to digital settings, where the expertise of the other party is less transparent than in face-to-face interactions. Recently, large language models (LLMs) have emerged as easily accessible tools, offering a novel way to formulate medical questions and receive seemingly qualified advice. Given the sensitive nature of health-related queries and the lack of professional supervision, incorrect advice can pose serious health risks. Therefore, including explicit disclaimers and precise referrals in LLM responses to medical queries is crucial. However, little is known about how LLMs adapt their safety implementations in response to different urgency levels. Objective: This study evaluates disclaimer and referral patterns in responses from LLMs to authentic medical queries of different urgency levels using a systematic evaluation framework. Methods: This prospective, multimodel evaluation study generated and analyzed 908 responses from 4 popular LLMs (GPT-4o, Claude Sonnet-4, Grok-3, and DeepSeek-V3) to 227 authentic patient queries from a public dataset. Two human raters classified all 227 patient queries using a 3-level urgency scale. LLM responses were evaluated using a 5-point ordinal classification system for disclaimer and referral advice, ranging from “no disclaimer” to “urgent advice to consult a medical professional.” GPT-4o served as the primary rater model for this task after conducting a subset validation against human expert annotations. Statistical analyses included Jonckheere-Terpstra tests to examine the relationship between case urgency and disclaimer ratings and Kruskal-Wallis tests for intermodel comparisons. Results: The 227 patient queries were distributed as 77 (34%) low-urgency, 110 (48%) intermediate-urgency, and 40 (18%) high-urgency cases. All 4 LLMs demonstrated statistically significant ordered trends (all <.001), with higher-urgency queries receiving more explicit referral advice. Disclaimer and referral advice clustered toward higher categories across all models, with 97% (881/908) of responses indicating that a medical professional should be consulted. Sonnet-4, Grok-3, and GPT-4o demonstrated a conservative approach, with 89%, 89%, and 88%, respectively, of their responses being either explicit or urgent referrals. In contrast, DeepSeek-V3 showed a broader distribution, with 74% of responses falling into these categories. Interrater reliability between GPT-4o and human raters achieved moderate to substantial agreement, with weighted Cohen κ values between 0.415 and 0.707. Conclusions: Current LLMs exhibit urgency-responsive safety mechanisms when providing medical advice. All evaluated models adaptively incorporate more explicit disclaimers and urgent referrals for higher-urgency queries. However, variability between LLMs highlights the need for standardized safety measures and appropriate regulatory frameworks. Although these findings indicate progress regarding safety concerns, the public availability of LLMs requires careful consideration to ensure consistent protection against patient harm while preserving the benefits of low-threshold access to health information.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844