Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

arXiv:2603.18740v1 Announce Type: cross Abstract: Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), ranging from interactive assistants to autonomous agents in

Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

arXiv:2603.18911v1 Announce Type: cross Abstract: Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches

Page image classification for content-specific data processing

arXiv:2507.21114v2 Announce Type: replace-cross Abstract: Digitization projects in humanities often generate vast quantities of page images from historical documents, presenting significant challenges for manual sorting

Sheaf Neural Networks and biomedical applications

arXiv:2602.00159v2 Announce Type: replace-cross Abstract: The purpose of this paper is to elucidate the theory and mathematical modelling behind the sheaf neural network (SNN) algorithm

How Uncertainty Estimation Scales with Sampling in Reasoning Models

arXiv:2603.19118v1 Announce Type: new Abstract: Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel

Quality assessment of large language model–generated prior authorization letters in nephrology

March 10, 2026

BackgroundPrior authorization (PA) is a major source of administrative burden, treatment delay, and clinician burnout. Artificial intelligence (AI), particularly large language models (LLMs), is increasingly used to assist with clinical documentation, yet its reliability for payer-facing administrative tasks remains uncertain.ObjectiveTo evaluate the quality of PA letters drafted by ChatGPT-5 for commonly used medications requiring PA in nephrology. Quality was evaluated based on correctness and strength of clinical reasoning.MethodsWe created a single standardized prompt and applied it across 29 nephrology scenarios to generate PA letters. Each PA letter was reviewed against four criteria: 1) absence of false statements or hallucinations, 2) correctness of ICD-10 coding, 3) presence and validity of citations, and 4) clinical reasoning, rated on a 4-point Likert scale (illogical, weak, adequate and strong). FDA drug labels, KDIGO guidelines and related randomized controlled trials were used as reference standards.ResultsOut of 29 letters, one letter (3.5%) contained false statements mentioning an irrelevant clinical trial. The ICD-10 diagnosis code was correct in 23 letters (79.3%), most errors were related to chronic kidney disease (CKD) staging or internal diagnostic inconsistencies. 27 letters (93.1%) cited valid references, with one letter citing an incorrect trial and another one citing a correct KDIGO guideline with inaccessible link. Twenty-six letters (89.7%) demonstrated strong clinical reasoning, supported by guideline-oriented or FDA label–aligned justification. The remaining 3 letters were rated as adequate reasoning. The main areas for improvement involved citing relevant references and emphasizing special considerations, for example Risk Evaluation and Mitigation Strategy (REMS) compliance for eculizumab.ConclusionsChatGPT-5 can generate clinically coherent PA drafts for nephrology medications, but limitations in coding precision and citation reliability persist. With appropriate oversight, AI-assisted documentation may reduce administrative burden while maintaining safety and accuracy.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844