BackgroundThe rapid evolution of general large language models (LLMs) provides a promising framework for integrating artificial intelligence into medical practice. While these models are capable of generating medically relevant language, their application in evidence inference in clinical scenarios may pose potential challenges. This study employs empirical experiments to analyze the capability boundaries of current general-purpose LLMs within evidence-based medicine (EBM) tasks, and provides a philosophical reflection on their limitations.MethodsThis study evaluates the performance of three general-purpose LLMs, including ChatGPT, DeepSeek, and Gemini, when directly applied to core tasks of EBM. The models were tested in a baseline, unassisted setting, without task-specific fine-tuning, external evidence retrieval, or embedded prompting frameworks. Two clinical scenarios, namely SGLT2 inhibitors for heart failure and PD-1/PD-L1 inhibitors for advanced NSCLC were used to assess performance in evidence generation, evidence synthesis, and clinical judgment. Model outputs were evaluated using a multidimensional rubric. The empirical results were analyzed from an epistemological perspective.ResultsExperiments show that the evaluated general-purpose LLMs can produce syntactically coherent and medically plausible outputs in core evidence-related tasks. However, under current architectures and baseline deployment conditions, several limitations remain, including imperfect accuracy in numerical extraction and processing, limited verifiability of cited sources, inconsistent methodological rigor in synthesis, and weak attribution of clinical responsibility in recommendations. Building on these empirical patterns, the philosophical analysis reveals three potential risks in this testing setting, including disembodiment, deinstitutionalization, and depragmatization.ConclusionsThis study suggests that directly applying general-purpose LLMs to clinical evidence tasks entails some limitations. Under current architectures, these systems lack embodied engagement with clinical phenomena, do not participate in institutional evaluative norms, and cannot assume responsibility for reasoning. These findings provide a directional compass for future medical AI, including ground outputs in real-world data, integrate deployment into clinical workflows with oversight, and design human-AI collaboration with clear responsibility.
Convolutional automatic identification of B-lines and interstitial syndrome in lung ultrasound images using pre-trained neural networks with feature fusion
IntroductionInterstitial/alveolar syndrome (IS) is a condition detectable on lung ultrasound (LUS) that indicates underlying pulmonary or cardiac diseases associated with significant morbidity and increased mortality



