• Home
  • DTx
  • Locally-deployed vs. cloud-based AI in healthcare: evaluating DeepSeek-R1:8b, DeepSeek-R1, and ChatGPT o3-mini-high for complex medical diagnostics

Reasoning large language models are increasingly considered for healthcare-related artificial intelligence applications, but their practical value depends not only on diagnostic accuracy, but also on responsiveness and operational reliability. In this study, we benchmarked six model settings on 1,000 questions from the MedQA dataset: DeepSeek-R1, its distilled 8-billion-parameter local variant DeepSeek-R1:8b, ChatGPT o3-mini-high, and their knowledge-base–augmented counterparts. We evaluated performance across three dimensions: diagnostic accuracy, response latency, and first-attempt connection reliability. DeepSeek-R1 achieved the highest accuracy (89.5%, 95% CI: 87.4–91.2) but showed substantially longer response times (median 26.54 s) and higher connection failure rates (4.6%). ChatGPT o3-mini-high responded faster (median 10.05 s) and showed the most favorable tail-latency profile, but its accuracy (78.2%, 95% CI: 75.5–80.7) was lower than that of DeepSeek-R1. The locally deployed DeepSeek-R1:8b demonstrated markedly stronger connection reliability (failure rate 0.2%, 95% CI: 0.0%–0.5%) but substantially reduced accuracy (55.0%, 95% CI: 51.9%–58.5%). Knowledge-base augmentation did not consistently improve performance; for DeepSeek-R1, it significantly reduced accuracy by 4.36% (p=0.0002), while no significant benefit was observed for the other models. These findings show that reasoning model performance in medical question answering is best understood as a trade-off among accuracy, latency, connection reliability, and deployment mode, and that retrieval augmentation is not universally beneficial. More broadly, this study provides deployment-relevant benchmarking evidence for evaluating reasoning models in healthcare-related settings, while also indicating the need for richer knowledge resources and more realistic task environments before such systems can be meaningfully assessed for real-world clinical use.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844