Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

BackgroundSocial media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due to short, noisy text, overlapping symptomatology,

Internet-Based Problem Management Plus Intervention for Antenatal Depression: Randomized Controlled Trial

Background: The identification and management of depression during pregnancy is an important public health issue. Although many existing psychological intervention programs are effective, their implementation

Measuring the Impact of AI on Report-Drafting Efficiency in Chest Computed Tomography Interpretation: Retrospective Analysis

Background: Artificial intelligence (AI), particularly deep learning, has shown promise in enhancing medical image interpretation and improving radiologists’ efficiency. In China, growing imaging demand and

Global Stability Analysis of the Age-Structured Chemostat With Substrate Dynamics

arXiv:2603.25276v1 Announce Type: cross Abstract: In this paper we study the stability properties of the equilibrium point for an age-structured chemostat model with renewal boundary

Interpretable PM2.5 Forecasting for Urban Air Quality: A Comparative Study of Operational Time-Series Models

arXiv:2603.25495v1 Announce Type: cross Abstract: Accurate short-term air-quality forecasting is essential for public health protection and urban management, yet many recent forecasting frameworks rely on

Cross-Model Disagreement as a Label-Free Correctness Signal

March 27, 2026

arXiv:2603.25450v1 Announce Type: new
Abstract: Detecting when a language model is wrong without ground truth labels is a fundamental challenge for safe deployment. Existing approaches rely on a model’s own uncertainty — such as token entropy or confidence scores — but these signals fail critically on the most dangerous failure mode: confident errors, where a model is wrong but certain. In this work we introduce cross-model disagreement as a correctness indicator — a simple, training-free signal that can be dropped into existing production systems, routing pipelines, and deployment monitoring infrastructure without modification. Given a model’s generated answer, cross-model disagreement computes how surprised or uncertain a second verifier model is when reading that answer via a single forward pass. No generation from the verifying model is required, and no correctness labels are needed. We instantiate this principle as Cross-Model Perplexity (CMP), which measures the verifying model’s surprise at the generating model’s answer tokens, and Cross-Model Entropy (CME), which measures the verifying model’s uncertainty at those positions. Both CMP and CME outperform within-model uncertainty baselines across benchmarks spanning reasoning, retrieval, and mathematical problem solving (MMLU, TriviaQA, and GSM8K). On MMLU, CMP achieves a mean AUROC of 0.75 against a within-model entropy baseline of 0.59. These results establish cross-model disagreement as a practical, training-free approach to label-free correctness estimation, with direct applications in deployment monitoring, model routing, selective prediction, data filtering, and scalable oversight of production language model systems.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844