Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

BackgroundSocial media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due to short, noisy text, overlapping symptomatology,

Editorial: Advancing digital mental health for youth

Post Content

Use, Utility, and User Experience of Cloud-Based Medical Imaging in Pulmonary Nodule Care in China: Mixed Methods Study

Background: The detection of pulmonary nodules (PNs) has increased with the use of low-dose computed tomography screening. Effective management requires timely longitudinal surveillance and reliable

When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models

arXiv:2603.26556v1 Announce Type: cross Abstract: Converting a pretrained Transformer into a more efficient hybrid model through distillation offers a promising approach to reducing inference costs.

Humanline: Online Alignment as Perceptual Loss

arXiv:2509.24207v2 Announce Type: replace Abstract: Online alignment (e.g., GRPO) is generally more performant than offline alignment (e.g., DPO) — but why? Drawing on prospect theory

Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion

March 23, 2026

arXiv:2603.19266v1 Announce Type: cross
Abstract: Distilling robust reasoning capabilities from large language models (LLMs) into smaller, computationally efficient student models remains an unresolved challenge. Despite recent advances, distilled models frequently suffer from superficial pattern memorization and subpar generalization. To overcome these limitations, we introduce a novel distillation framework that moves beyond simple mimicry to instill a deeper conceptual understanding. Our framework features two key innovations. underlinetextitFirst, to address pattern memorization, Explanatory Inversion (EI) generates targeted “explanatory probes” that compel the student to articulate the underlying logic behind an answer, rather than just memorizing it. underlinetextitSecond, to improve generalization, Explanatory GRPO (textttEXGRPO) uses a reinforcement learning algorithm with a novel Dialogue Structure Utility Bonus, which explicitly rewards the student for maintaining a coherent reasoning process across these probes. Extensive evaluations on 12 datasets demonstrate significant improvements. Using Gemma-7b as the student model, our method yields an average textbf20.39% increase over zero-shot performance and a textbf6.02% improvement over the state-of-the-art distillation baselines. Moreover, models distilled with our method show remarkable training efficiency (e.g., surpassing vanilla fine-tuning with textbf10-25% training data) and strong generalization to out-of-distribution tasks. Implementation is released at https://github.com/Zhen-Tan-dmml/ExGRPO.git.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844