ExECG: An Explainable AI Framework for ECG models

arXiv:2605.19258v1 Announce Type: cross Abstract: Deep learning has enabled ECG diagnostic models with strong performance in tasks such as arrhythmia classification and abnormality detection. However,

LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators

arXiv:2605.19060v1 Announce Type: cross Abstract: High-resolution 3D medical image generation remains challenging because fully volumetric models are computationally expensive, while efficient 2D slice generators often

GRASP: Deterministic argument ranking in interaction graphs

arXiv:2605.19141v1 Announce Type: cross Abstract: Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their

The Extremum Stack is a Minimal Sufficient Statistic for Rate-Independent Functionals: A Kolmogorov Complexity Characterisation

arXiv:2605.18885v1 Announce Type: cross Abstract: We prove that the extremum stack of a discrete sequence is a minimal sufficient statistic for the class of all

ESLD (External Surrogate Latent Defense): A Latent-Space Architecture for Faster, Stronger Prompt-Injection Defense

arXiv:2605.18918v1 Announce Type: cross Abstract: Modern AI assistants are agentic. To answer a single user request, the underlying language model pulls in information from many

Few-Shot Large Language Models for Actionable Triage Categorization of Online Patient Inquiries

May 18, 2026

arXiv:2605.15680v1 Announce Type: cross
Abstract: Online patient inquiries are often informal, incomplete, and written before professional assessment, yet they must still be routed to an appropriate level of clinical follow-up. We study this as a four-class actionable triage task — self-care, schedule-visit, urgent-clinician-review, or emergency-referral, and ask whether prompted large language models (LLMs) can support such routing under low-resource labeling conditions. Using the public HealthCareMagic-100K corpus, we construct a 300-example human calibrated gold evaluation set, a 700-example auto-labeled silver training set, and a 40-example few-shot pool. We compare Term Frequency-Inverse Document Frequency (TF-IDF) and Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT) baselines train on silver labels against six prompted LLMs under 0-shot, 4-shot, and 12-shot conditions respectively. Accordingly, we evaluate with macro-$F_1$ alongside safety-aware metrics, including emergency-recall, under-triage rate, and severe under-triage rate. The strongest LLM (Claude Haiku 4.5, 12-shot) reaches macro-$F_1$ 0.475, exceeding the best supervised baseline (BioBERT, 0.378) on point estimate, with overlapping confidence intervals. Few-shot prompting and two-model agreement help in label-dependent ways: self-care agreement is reliable, urgent-clinician-review is not. We conclude that LLMs can support triage prioritization and selective human review, but not autonomous deployment.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844