The Central Coupler of the AAA+ ATPase ClpXP Controls Intersubunit Communication and Couples the Conversion of Chemical Energy into the Generation of Force

ClpX is a clockwise hexameric helical arrangement that hydrolyzes ATP to unfold proteins and translocate them into the proteolytic chamber. We investigate the central coupler,

Structural features of E. coli Stx bacteriophage phi24B revealed with cryo-electron microscopy

Shiga toxin-converting bacteriophages play a critical role in the emergence and virulence of pathogenic Escherichia coli strains. Despite their significance, detailed structural information on these

Optimization of AAV tools to target M&uumlller glial cells for retinal gene therapy

Reprogramming of M&uumlller glial (MG) cells into retinal neurons has the potential to treat vision loss by regenerating the retina. Development of efficient gene delivery

scMultiPreDICT: A single-cell predictive framework with transcriptomic and epigenetic signatures

Cellular responses to genetic perturbations depend on both transcriptional programs and the epigenetic landscape. While single-cell multiomics technologies enable simultaneous profiling of gene expression and

Engineering a Glucose-Inducible Whole-Cell Biosensor via CRISPRi-Based Promoter Reprogramming

Precise monitoring of intracellular glucose dynamics is essential for understanding carbon flux, optimizing microbial bioprocesses, and enabling responsive control of engineered metabolic pathways. Here, we

Fine-grained Approaches for Confidence Calibration of LLMs in Automated Code Revision

April 9, 2026

arXiv:2604.06723v1 Announce Type: cross
Abstract: In today’s AI-assisted software engineering landscape, developers increasingly depend on LLMs that are highly capable, yet inherently imperfect. The tendency of these models to produce incorrect outputs can reduce developer productivity. To this end, a canonical mitigation method is to provide calibrated confidence scores that faithfully reflect their likelihood of correctness at the instance-level. Such information allows users to make immediate decisions regarding output acceptance, abstain error-prone outputs, and better align their expectations with the model’s capabilities. Since post-trained LLMs do not inherently produce well-calibrated confidence scores, researchers have developed post-hoc calibration methods, with global Platt-scaling of sequence-level confidence scores proving effective in many generative software engineering tasks but remaining unreliable or unexplored for automated code revision (ACR) tasks such as program repair, vulnerability repair, and code refinement. We hypothesise that the coarse-grained nature of this conventional method makes it ill-suited for ACR tasks, where correctness is often determined by local edit decisions and miscalibration can be sample-dependent, thereby motivating fine-grained confidence calibration. To address this, our study proposes local Platt-scaling applied separately to three different fine-grained confidence scores. Through experiments across 3 separate tasks and correctness metrics, as well as 14 different models of various sizes, we find that fine-grained confidence scores consistently achieve lower calibration error across a broader range of probability intervals, and this effect is further amplified when global Platt-scaling is applied. Our proposed approaches offer a practical solution to eliciting well-calibrated confidence scores, enabling more trustworthy and streamlined usage of imperfect models in ACR tasks.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844