BackgroundSocial media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due to short, noisy text, overlapping symptomatology, and labeling bias. Large language models (LLMs) are increasingly used in mental health for tasks such as symptom extraction, risk screening, and triage, yet their reliability for fine-grained depression subtype classification from brief social media posts remains underexplored.ObjectiveWe benchmarked few-shot, prompt-only LLMs against parameter-efficient fine-tuned encoders for identifying depression subtypes in posts on X (formerly Twitter).MethodsWe used a curated dataset of 14,983 English-language tweets stratified into six clinically grounded categories: five depression subtypes (postpartum, major, bipolar, psychotic, atypical) and a no-depression class. We compared (i) instruction-tuned causal LLMs in a few-shot setting and (ii) supervised fine-tuning of transformer encoders (e.g., RoBERTa, DeBERTa, BERTweet) under identical splits and metrics. The primary evaluation metric was macro-F1 (with accuracy, precision, recall as secondary). We also report per-class precision, recall, and F1 scores, along with confusion matrices, for the best-performing model from each model family.ResultsFew-shot LLMs achieved macro-F1 = 0.73–0.77 (best: Llama-3-8B, accuracy 0.75). Fine-tuned encoders consistently outperformed prompt-only models, reaching macro-F1 = 0.94–0.96 (best: RoBERTa-large, accuracy 0.954). Relative improvements were largest for the clinically challenging classes. Fine-tuning increased F1 for postpartum and psychotic subtypes to ≈0.99 (substantially above few-shot) and boosted major-depression recall from ≈0.53–0.60 to ≈0.95–0.97. Error analyses showed prompt-only models frequently misclassified major and atypical depression as bipolar, patterns substantially reduced by fine-tuning.ConclusionsOn tweet-level depression subtyping, task-specific adaptation via fine-tuning yields substantially higher and more stable performance than few-shot prompting, particularly for nuanced, clinically anchored classes. These findings recommend fine-tuned encoders as strong, compute-efficient baselines for depression subtype classification from social media.
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
arXiv:2603.19312v1 Announce Type: cross Abstract: Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods




