Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

BackgroundSocial media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due to short, noisy text, overlapping symptomatology,

A Multi-Task Targeted Learning Framework for Lithium-Ion Battery State-of-Health and Remaining Useful Life

arXiv:2603.22323v1 Announce Type: cross Abstract: Accurately predicting the state-of-health (SOH) and remaining useful life (RUL) of lithium-ion batteries is crucial for ensuring the safe and

General Machine Learning: Theory for Learning Under Variable Regimes

arXiv:2603.23220v1 Announce Type: cross Abstract: We study learning under regime variation, where the learner, its memory state, and the evaluative conditions may evolve over time.

MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

arXiv:2603.22289v1 Announce Type: cross Abstract: Knowledge Tracing (KT) models students’ evolving knowledge states to predict future performance, serving as a foundation for personalized education. While

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

arXiv:2602.12288v3 Announce Type: replace-cross Abstract: With the growth of intelligent civil infrastructure and smart cities, operation and maintenance (O&M) increasingly requires safe, efficient, and energy-conscious

BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation

March 10, 2026

arXiv:2506.09487v3 Announce Type: replace-cross
Abstract: This paper presents BemaGANv2, an advanced GAN-based vocoder designed for high-fidelity and long-term audio generation, with a focus on systematic evaluation of discriminator combination strategies. Long-term audio generation is critical for applications in Text-to-Music (TTM) and Text-to-Audio (TTA) systems, where maintaining temporal co- herence, prosodic consistency, and harmonic structure over extended durations remains a significant challenge. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which internally applies the Snake activation function to better model periodic structures. In the discriminator framework, we integrate the Multi-Envelope Discriminator (MED), a novel architecture we proposed, to extract rich temporal en- velope features crucial for periodicity detection. Coupled with the Multi-Resolution Discriminator (MRD), this com- bination enables more accurate modeling of long-range dependencies in audio. We systematically evaluate various discriminator configurations, including Multi-Scale Discriminator (MSD) + MED, MSD + MRD, and Multi-Period Discriminator (MPD) + MED + MRD, using objective metrics (Fr’echet Audio Distance (FAD), Structural Similar- ity Index (SSIM), Pearson Correlation Coefficient (PCC), Mel-Cepstral Distortion (MCD), Multi-Resolution STFT (M-STFT), Periodicity error (Periodicity)) and subjective evaluations (MOS, SMOS). To support reproducibility, we provide detailed architectural descriptions, training configurations, and complete implementation details. The code, pre-trained models, and audio demo samples are available at: https://github.com/dinhoitt/BemaGANv2.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844