BackgroundMachine-learning-based depression screening from student survey data complements clinician assessment but faces two obstacles: class imbalance (causing under-prediction of urgent minority cases) and the untested assumption that synthetic augmentation data preserve psychometric validity. Although recent work has begun to evaluate distributional fidelity of synthetic survey data, no study has systematically benchmarked both classification utility and classical construct-validity measures on augmented student-mental-health datasets.ObjectiveBenchmark eight data-augmentation strategies on their classification utility and construct validity using a dual-axis evaluation framework.MethodsWe evaluated no augmentation, random oversampling, SMOTE, BorderlineSMOTE, ADASYN, CTGAN, TVAE, and the proposed Psychometric-Constrained Tabular GAN (PCT-GAN) on three public datasets (PHQ-9: n = 682; Student Depression: n = 27,837; DASS-42: n = 26,459) using four classifiers (Random Forest, XGBoost, SVM, MLP) under 5-fold cross-validation. PCT-GAN extends conditional Wasserstein-GAN-with-gradient-penalty with two psychometric regularisers: (1) factor-reconstruction loss preserving item principal-component structure, and (2) correlation-preservation loss minimizing inter-item correlation deviation. Classification performance was assessed using Macro-F1, ROC-AUC, G-mean, and MCC; psychometric validity via Cronbach’s α, Tucker’s Congruence Coefficient, and Frobenius-norm deviation.ResultsOverall, the eight methods were not statistically distinguishable on Macro-F1 (Friedman χ2 = 10.44, p = 0.17, k = 8, N = 12). However, dataset-specific patterns emerged: PCT-GAN + Random Forest achieved the largest gain on small, imbalanced PHQ-9 data (F1 = 0.814 vs. 0.785 NoAug); SMOTE variants dominated on large DASS-42 data with neural networks (F1 > 0.99). On psychometric validity, PCT-GAN reduced inter-item correlation deviation by 15% vs. TVAE and 31% vs. CTGAN, while maintaining direction-correct Cronbach’s α (mean |Δα| = 0.045). TVAE, by contrast, inflated α above the real-data ceiling on all three DASS-42 subscales—a marker of psychometric invalidity rather than superior fidelity. Pareto analysis identified PCT-GAN as the only method ranking in the top three for both classification utility and psychometric fidelity.ConclusionsEmbedding construct-validity constraints into generative models produces synthetic mental-health data that is simultaneously useful for prediction and trustworthy for secondary research reuse. Practitioners should select SMOTE-family oversamplers for transient training-set augmentation, and PCT-GAN when synthetic data will be shared, re-analysed, or pooled with real samples for downstream psychometric research.
Ensemble based in transfer learning for cytological classification in pleural fluid
Pleural effusion cytology is critical for diagnosing benign and malignant conditions, yet manual interpretation remains time-consuming and prone to subjectivity. The increasing burden of malignant