arXiv:2605.05225v3 Announce Type: replace-cross Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual […]
Database Normalization via Dual-LLM Self-Refinement
arXiv:2508.17693v2 Announce Type: replace-cross Abstract: Database normalization is crucial to preserving data integrity. However, it is time-consuming and error-prone, as it is typically performed manually by data engineers. To this end, we present Miffie, a database normalization framework that leverages the capability of large language models. Miffie enables automated data normalization without human effort while […]
TSAQA: Time Series Analysis Question And Answering Benchmark
arXiv:2601.23204v2 Announce Type: replace Abstract: Time series data are integral to critical applications across domains such as finance, healthcare, transportation, and environmental science. While recent work has begun to explore multi-task time series question answering (QA), current benchmarks remain limited to forecasting and anomaly detection tasks. We introduce TSAQA, a novel unified benchmark designed to […]
Latent-space Attacks for Refusal Evasion in Language Models
arXiv:2605.21706v2 Announce Type: replace Abstract: Safety-aligned language models are trained to refuse harmful requests, yet refusal behavior can be suppressed by steering their internal representations. Existing methods do so by ablating a refusal direction from model activations, aiming to remove refusal from the model’s residual stream. Despite their empirical success, these methods lack a principled […]
Mitosis Detection in the Wild: Multi-Tumor and Context-Aware Generalization in the MIDOG 2025 Challenge
arXiv:2606.07368v1 Announce Type: cross Abstract: Automated mitosis detection is a well-established task in computational pathology. While previous benchmarks focused on scanner-induced domain shift, clinical “real-world” application requires models to be robust across the vast variance to be expected in the histological landscape. The MItosis DOmain Generalization (MIDOG) 2025 challenge was designed to evaluate algorithmic performance […]
Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning
arXiv:2606.07500v1 Announce Type: cross Abstract: Continual learning in Large Language Models (LLMs) is hindered by the plasticity-stability dilemma, where acquiring new capabilities often leads to catastrophic forgetting of previous knowledge. Existing methods typically treat parameters uniformly, failing to distinguish between specific task knowledge and shared capabilities. We introduce Mixture of Sparse Experts for Task Agnostic […]
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
arXiv:2602.07026v3 Announce Type: replace-cross Abstract: Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct modalities expressing identical semantics occupy systematically offset regions. Prior approaches to bridge this gap are largely limited by oversimplified isotropic assumptions, hindering their application in large-scale […]
Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review
arXiv:2605.17548v2 Announce Type: replace-cross Abstract: Code review has evolved for decades, from informal peer checking to today’s pull request (PR) workflows, yet it remains a largely manual and cognitively demanding process. The rise of Artificial Intelligence (AI) coding assistants has intensified this challenge: while these tools increase code production velocity, they also expand the volume […]
Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions
arXiv:2606.03382v2 Announce Type: replace-cross Abstract: While Proximal Policy Optimization (PPO) demonstrates strong performance in stationary settings, we show that its standard optimization paradigm struggles in continual and non-stationary environments. The failure does not stem from insufficient model capacity or overly restrictive clipping. Instead, PPO performs persistent, directionally inefficient local updates, which indicates a lack of […]
Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition
arXiv:2606.07309v1 Announce Type: cross Abstract: Instruction-following audio language models (ALMs) can be augmented with explicit acoustic cues, yet it remains unclear whether such cues are used in a grounded way when the raw audio is already available. We study this question in speech emotion recognition (SER) by deriving six interpretable acoustic concept tokens from the […]
Spectral Scaling Laws of Muon
arXiv:2606.04058v2 Announce Type: replace-cross Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization with the Newton–Schulz (NS) iteration. Since NS is only approximate, directions with small singular values fail to be […]
CULTURESCORE: Evaluating Cultural Faithfulness in Video Generation Models
arXiv:2606.07311v1 Announce Type: cross Abstract: As video generation models like Veo 3.1 and LTX-2 advance, their ability to accurately represent diverse global cultures remains a critical yet understudied frontier. Current metrics, such as VideoScore, only measure visual quality but offer no mechanism for assessing cultural faithfulness. Consequently, a model that replaces a Namaste with a […]