Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

BackgroundSocial media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due to short, noisy text, overlapping symptomatology,

Telehealth Intervention to Reduce Sedentary Behavior in Older Adults With Type 2 Diabetes: Development and Feasibility Study

Background: Sedentary behavior (SB) is a modifiable risk factor for complications in older adults with type 2 diabetes mellitus (T2DM). Despite widespread adoption of digital

Educating Students About Digital Health Research Ethics: Curricula Review and Expert Interview Study

Background: The rapid growth of digital health research, involving wearable devices, mobile apps, and sociotechnical health systems, raises complex ethical, legal, and social considerations. While

Patient Sharing of Digital Health Data in the Veterans Health Administration: Cross-Sectional Analysis

Background: The integration of patient-generated health data (PGHD) into health care has the potential to significantly transform patient care and clinical practice. PGHD includes health-related

Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level dropin & Neuroplasticity Mechanisms

arXiv:2603.24343v1 Announce Type: cross Abstract: Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

March 24, 2026

arXiv:2603.16929v2 Announce Type: replace-cross
Abstract: Regulating the importance ratio is critical for the training stability of Group Relative Policy Optimization (GRPO) based frameworks. However, prevailing ratio control methods, such as hard clipping, suffer from non-differentiable boundaries and vanishing gradient regions, failing to maintain gradient fidelity. Furthermore, these methods lack a hazard-aware mechanism to adaptively suppress extreme deviations, leaving the optimization process vulnerable to abrupt policy shifts. To address these challenges, we propose Modulated Hazard-aware Policy Optimization (MHPO), a novel framework designed for robust and stable reinforcement learning. The proposed MHPO introduces a Log-Fidelity Modulator (LFM) to map unbounded importance ratios into a bounded, differentiable domain. This mechanism effectively prevents high-variance outlier tokens from destabilizing the loss landscape while ensuring global gradient stability. Complementarily, a Decoupled Hazard Penalty (DHP) integrates cumulative hazard functions from survival analysis to independently regulate positive and negative policy shifts. By shaping the optimization landscape with hazard-aware penalties, the proposed MHPO achieves fine-grained regulation of asymmetric policy shifts simultaneously mitigating mode collapse from over-expansion and preventing policy erosion from catastrophic contraction within a stabilized trust region. Extensive evaluations on diverse reasoning benchmarks across both text-based and vision-language tasks demonstrate that MHPO consistently outperforms existing methods, achieving superior performance while significantly enhancing training stability.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844