Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

BackgroundSocial media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due to short, noisy text, overlapping symptomatology,

Editorial: Advancing digital mental health for youth

Post Content

When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models

arXiv:2603.26556v1 Announce Type: cross Abstract: Converting a pretrained Transformer into a more efficient hybrid model through distillation offers a promising approach to reducing inference costs.

Humanline: Online Alignment as Perceptual Loss

arXiv:2509.24207v2 Announce Type: replace Abstract: Online alignment (e.g., GRPO) is generally more performant than offline alignment (e.g., DPO) — but why? Drawing on prospect theory

QHap: Quantum-Inspired Haplotype Phasing

arXiv:2603.25762v1 Announce Type: new Abstract: Haplotype phasing, the process of resolving parental allele inheritance patterns in diploid genomes, is critical for precision medicine and population

Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis

March 23, 2026

arXiv:2603.19259v1 Announce Type: cross
Abstract: Taiwanese Hokkien (Taigi) presents unique opportunities for advancing speech technology methodologies that can generalize to diverse linguistic contexts. We introduce Breeze Taigi, a comprehensive framework centered on standardized benchmarks for evaluating Taigi speech recognition and synthesis systems. Our primary contribution is a reproducible evaluation methodology that leverages parallel Taiwanese Mandarin resources. We provide 30 carefully curated Mandarin-Taigi audio pairs from Taiwan’s Executive Yuan public service announcements with normalized ground truth transcriptions. We establish Character Error Rate (CER) as the standard metric and implement normalization procedures to enable fair cross-system comparisons. To demonstrate the benchmark’s utility and provide reference implementations, we develop speech recognition and synthesis models through a methodology that leverages existing Taiwanese Mandarin resources and large-scale synthetic data generation. In particular, we fine-tune a Whisper model on approximately 10,000 hours of Taigi synthetic speech data. Our ASR model achieves 30.13% average CER on the benchmark, outperforming existing commercial and research systems. By providing standardized evaluation protocols, diverse training datasets, and open baseline models, we offer a replicable framework with methodologies applicable to various linguistic contexts.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844