• Home
  • Uncategorized
  • Classifying American Society of Anesthesiologists Physical Status With a Low-Rank–Adapted Large Language Model: Development and Validation Study

Background: The American Society of Anesthesiologists Physical Status (ASA-PS) classification is integral to preoperative risk assessment; yet, assignment remains subjective and labor-intensive. Recent large language models (LLMs) process free-text electronic health records (EHRs), but few studies have evaluated parameter-efficient adaptations that both predict ASA-PS and provide clinician-readable rationales. Low-rank adaptation (LoRA) is a parameter-efficient technique that updates only a small set of add-on parameters rather than the entire model, enabling efficient fine-tuning on modest data and hardware. A lightweight, instruction-tuned LLM with these capabilities could streamline workflow and broaden access to explainable decision support. Objective: This study aimed to develop and evaluate a LoRA–fine-tuned large language model meta-AI (LLaMA-3) for ASA-PS classification from preoperative clinical narratives and benchmark it against traditional machine learning classifiers and domain-specific LLMs. Methods: Preoperative anesthesia notes and discharge summaries were extracted from the EHR and reformatted into an Alpaca-style instruction-response prompt, requesting ASA-PS class labels (I-V) annotated by anesthesiologists. The LoRA-enhanced LLaMA-3 model was fine-tuned with mixed-precision training and evaluated on a hold-out test set. Baselines included random forest classifier, Extreme Gradient Boosting (XGBoost) classifier, support vector machine, fastText, BioBERT, ClinicalBERT, and untuned LLaMA-3. Performance was assessed with micro- and macroaveraged F1-score and Matthews correlation coefficient (MCC), each reported with 95% bootstrap CIs. Pairwise model error rates were compared using McNemar test. Results: The LoRA-LLaMA-3 model achieved a micro–F1-score of 0.780 (95% CI 0.769-0.792) and an MCC of 0.533 (95% CI 0.518-0.546), outperforming other LLM baselines. After fine-tuning, BioBERT reached a micro–F1-score of 0.762 and an MCC of 0.508, whereas ClinicalBERT achieved a micro–F1-score of 0.757 and an MCC of 0.515. fastText yielded a micro–F1-score of 0.762 and an MCC of 0.536. The untuned LLaMA-3 performed poorly (micro–F1-score of 0.073; MCC of 0.002). However, macro–F1-score of LoRA-LLaMA-3 (0.316) was lower than that of other language models (0.349-0.372). Among all models, XGBoost obtained the highest scores (micro–F1-score of 0.815, 95% CI 0.804-0.826; macro–F1-score of 0.348, 95% CI 0.334-0.361; MCC 0.613, 95% CI 0.599-0.626). Ablation experiments identified dropout = 0.3, learning rate = 3×10-5, temperature = 0.1, and top-P= 0.1 as the optimal hyperparameter settings. The LoRA model also produced rationales that highlighted medically pertinent terms. Conclusions: LoRA fine-tuning improved LLaMA-3 from near-random performance into an ASA-PS classifier with higher micro–F1-score and significantly lower misclassification rates than other language model baselines. However, macroaveraged performance was lower, indicating limited discrimination for minority ASA classes. Traditional machine learning models demonstrated higher predictive performance. Beyond predictive performance, LoRA-LLaMA-3 generated clinician-oriented explanations that enhance decision transparency. By reformatting routine EHR narratives into instruction-response pairs and relying on lightweight parameter adaptation, this approach offers a practical, resource-efficient framework for introducing explainable LLMs to clinical classification tasks. Trial Registration:

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844