arXiv:2512.15791v2 Announce Type: replace-cross Abstract: In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of simulating realistic conversations with humans through text generation. Because of their impact on society, developing and deploying these language models must be done responsibly, with attention to their negative impacts and […]
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation
arXiv:2502.12468v2 Announce Type: replace-cross Abstract: The LLM-as-a-Judge paradigm shows promise for evaluating generative content but lacks reliability in reasoning-intensive scenarios, such as programming. Inspired by recent advances in reasoning models and shifts in scaling laws, we pioneer bringing test-time computation into LLM-as-a-Judge, proposing MCTS-Judge, a resource-efficient, System-2 thinking framework for code correctness evaluation. MCTS-Judge leverages […]
Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems
arXiv:2604.14585v2 Announce Type: replace Abstract: Prompt optimization in compound AI systems is statistically indistinguishable from a coin flip: across 72 optimization runs on Claude Haiku 4.5 (6 methods $times$ 4 tasks $times$ 3 repeats), 49% score below zero-shot; on Amazon Nova Lite, the failure rate is even higher. Yet on one task, all six methods […]
Tell Me a Story! Narrative-Driven XAI with Large Language Models
arXiv:2309.17057v3 Announce Type: replace Abstract: In many AI applications today, the predominance of black-box machine learning models, due to their typically higher accuracy, amplifies the need for Explainable AI (XAI). Existing XAI approaches, such as the widely used SHAP values or counterfactual (CF) explanations, are arguably often too technical for users to understand and act […]
Efficient Pre-Training of LLMs through Truncated SVD Layers
arXiv:2605.28573v1 Announce Type: cross Abstract: The massive scaling of Large Language Models (LLMs) has made pretraining increasingly cost-prohibitive. While low-rank representation and orthonormal weight matrices could in principle reduce parameter counts and computational overhead, most existing methods rely on static rank selection and do not enforce weight orthonormality due to high computational cost. This paper […]
VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer
arXiv:2605.28229v1 Announce Type: cross Abstract: With the rapid development of pre-training technologies, adapting large-scale Vision-Language Models (VLMs) for video understanding emphie image-to-video transfer learning has become a dominant paradigm. To achieve superior performance, it raises as an effective strategy among recent advances to employ Mixture-of-Experts (MoE) to enhance VLMs’ temporal modeling capabilities. However, conventional MoE […]
PilotTTS: A Disciplined Modular Recipe for Competitive Speech Synthesis
arXiv:2605.27258v2 Announce Type: replace-cross Abstract: Building state-of-the-art text-to-speech (TTS) systems typically demands millions of hours of proprietary data and complex multi-stage architectures, creating substantial barriers for resource-constrained research teams. In this report, we present PilotTTS, a lightweight autoregressive TTS system that achieves competitive performance through minimalist architecture and rigorous data engineering. PilotTTS is trained on […]
Noise Scheduling as Information-Guided Allocation in Diffusion Training
arXiv:2602.18647v2 Announce Type: replace-cross Abstract: We introduce InfoNoise, an online adaptive noise schedule for diffusion training that reallocates optimization effort toward noise levels where denoising is most informative. Together with loss weighting, a noise schedule induces an effective allocation across denoising problems, often fixed before informative noise levels are known. InfoNoise makes this allocation data-adaptive […]
The Forensic Cost of Watermark Removal: From Dedicated Attacks to Image Editing
arXiv:2604.25491v2 Announce Type: replace-cross Abstract: Current watermark removal methods are evaluated on two axes: attack success rate and perceptual quality. We show this is insufficient. While state-of-the-art attacks successfully degrade the watermark signal without visible distortion, they leave distinct statistical artifacts that betray the removal attempt. We name this overlooked axis Watermark Removal Detection (WRD) […]
Revisiting Change Detection Methods for their Application to Serac Fall Time-Lapse Monitoring
arXiv:2605.28100v1 Announce Type: cross Abstract: In an era where climate change aggravates environmental uncertainties, the identification and detection of event precursors are becoming crucial to mitigate the impacts of disastrous natural hazards. While classical sensors such as interferometric lasers or seismometers are reliable, their widespread deployment is often hindered by logistical and economic barriers, leaving […]
VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs
arXiv:2605.28422v1 Announce Type: cross Abstract: Latent reasoning enables reasoning over continuous hidden states rather than explicit tokens, avoiding the language bottleneck and inference overhead of chain-of-thought for medical VQA. However, existing methods suffer from modality collapse, insufficient visual supervision, and train-inference mismatch. Moreover, their opaque latent states offer no interpretability, which is critical in clinical […]
A Fresh Look at Lamarckian Evolution and the Baldwin Effect
arXiv:2605.28703v1 Announce Type: cross Abstract: Baldwinian and Lamarckian evolution have existed for a long time in evolutionary algorithms (EAs) without ever dominating the academic literature or practical applications. In this work, we use modern empirical and theoretical methods to revisit Lamarckian and Baldwinian evolution and rigorously compare them with the generic Darwinian evolution. On the […]