Using an Adult-Designed Wearable for Pediatric Monitoring: Practical Tutorial and Application in School-Aged Children With Obesity

This tutorial presents a step-by-step guide on how to use an adult-oriented wearable (Fitbit) to collect and analyze activity and cardiovascular data in a pediatric

Scalable and Robust Artificial Intelligence for Spine Alignment Assessment: Multicenter Study Enabled by Real-Time Data Transformation

Background: Artificial intelligence (AI) has shown promise for automating spinal alignment assessment in adolescent idiopathic scoliosis (AIS). However, AI models typically exhibit reduced accuracy and

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

arXiv:2603.18740v1 Announce Type: cross Abstract: Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), ranging from interactive assistants to autonomous agents in

Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

arXiv:2603.18911v1 Announce Type: cross Abstract: Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches

Page image classification for content-specific data processing

arXiv:2507.21114v2 Announce Type: replace-cross Abstract: Digitization projects in humanities often generate vast quantities of page images from historical documents, presenting significant challenges for manual sorting

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge

March 19, 2026

arXiv:2603.17145v1 Announce Type: cross
Abstract: Large language models (LLMs) are increasingly deployed as automated evaluators that assign numeric scores to model outputs, a paradigm known as LLM-as-a-Judge. However, standard Reinforcement Learning (RL) methods typically rely on binary rewards (e.g., 0-1 accuracy), thereby ignoring the ordinal structure inherent in regression tasks; for instance, they fail to recognize that predicting 4 is significantly better than predicting 1 when the ground truth is 5. Conversely, existing regression-aware approaches are often confined to Supervised Fine-Tuning (SFT), limiting their ability to explore optimal reasoning paths. To bridge this gap, we propose textbfREAL (underlineREgression-underlineAware Reinforcement underlineLearning), a principled RL framework designed to optimize regression rewards, and also proven to be optimal for correlation metrics. A key technical challenge is that the regression objective is explicitly policy-dependent, thus invalidating standard policy gradient methods. To address this, we employ the generalized policy gradient estimator, which naturally decomposes optimization into two complementary components: (1) exploration over Chain-of-Thought (CoT) trajectory, and (2) regression-aware prediction refinement of the final score. Extensive experiments across model scales (8B to 32B) demonstrate that REAL consistently outperforms both regression-aware SFT baselines and standard RL methods, exhibiting significantly better generalization on out-of-domain benchmarks. On Qwen3-32B specifically, we achieve gains of +8.40 Pearson and +7.20 Spearman correlation over the SFT baseline, and +18.30/+11.20 over the base model. These findings highlight the critical value of integrating regression objectives into RL exploration for accurate LLM evaluation.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844