arXiv:2601.19738v1 Announce Type: cross Abstract: Compiling quantum circuits into Clifford+$T$ gates is a central task for fault-tolerant quantum computing using stabilizer codes. In the near term, $T$ gates will dominate the cost of fault tolerant implementations, and any reduction in the number of such expensive gates could mean the difference between being able to run […]
SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation
arXiv:2601.19702v1 Announce Type: cross Abstract: The performance evaluation remains a complex challenge in audio separation, and existing evaluation metrics are often misaligned with human perception, course-grained, relying on ground truth signals. On the other hand, subjective listening tests remain the gold standard for real-world evaluation, but they are expensive, time-consuming, and difficult to scale. This […]
MV-S2V: Multi-View Subject-Consistent Video Generation
arXiv:2601.17756v2 Announce Type: replace-cross Abstract: Existing Subject-to-Video Generation (S2V) methods have achieved high-fidelity and subject-consistent video generation, yet remain constrained to single-view subject references. This limitation renders the S2V task reducible to an S2I + I2V pipeline, failing to exploit the full potential of video subject control. In this work, we propose and address the […]
Out-of-Distribution Generalization via Invariant Trajectories for Multimodal Large Language Model Editing
arXiv:2601.19700v1 Announce Type: cross Abstract: Knowledge editing emerges as a crucial technique for efficiently correcting incorrect or outdated knowledge in large language models (LLM). Existing editing methods for unimodal LLM rely on a rigid parameter-to-output mapping, which causes causal-underfit and causal-overfit in cascaded reasoning for Multimodal LLM (MLLM). In this paper, we reformulate MLLM editing […]
RvB: Automating AI System Hardening via Iterative Red-Blue Games
arXiv:2601.19726v1 Announce Type: cross Abstract: The dual offensive and defensive utility of Large Language Models (LLMs) highlights a critical gap in AI security: the lack of unified frameworks for dynamic, iterative adversarial adaptation hardening. To bridge this gap, we propose the Red Team vs. Blue Team (RvB) framework, formulated as a training-free, sequential, imperfect-information game. […]
AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion
arXiv:2601.19697v1 Announce Type: cross Abstract: Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between […]
A Systemic Evaluation of Multimodal RAG Privacy
arXiv:2601.17644v2 Announce Type: replace-cross Abstract: The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g. visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets to improve model performance, it risks the leakage of private information from these datasets during inference. In this paper, […]
Cross-Domain Offshore Wind Power Forecasting: Transfer Learning Through Meteorological Clusters
arXiv:2601.19674v1 Announce Type: cross Abstract: Ambitious decarbonisation targets are catalysing growth in orders of new offshore wind farms. For these newly commissioned plants to run, accurate power forecasts are needed from the onset. These allow grid stability, good reserve management and efficient energy trading. Despite machine learning models having strong performances, they tend to require […]
Learning to Detect Unseen Jailbreak Attacks in Large Vision-Language Models
arXiv:2508.09201v4 Announce Type: replace-cross Abstract: Despite extensive alignment efforts, Large Vision-Language Models (LVLMs) remain vulnerable to jailbreak attacks. To mitigate these risks, existing detection methods are essential, yet they face two major challenges: generalization and accuracy. While learning-based methods trained on specific attacks fail to generalize to unseen attacks, learning-free methods based on hand-crafted heuristics […]
The role of self-supervised pretraining in differentially private medical image analysis
arXiv:2601.19618v1 Announce Type: cross Abstract: Differential privacy (DP) provides formal protection for sensitive data but typically incurs substantial losses in diagnostic performance. Model initialization has emerged as a critical factor in mitigating this degradation, yet the role of modern self-supervised learning under full-model DP remains poorly understood. Here, we present a large-scale evaluation of initialization […]
Automated structural testing of LLM-based agents: methods, framework, and case studies
arXiv:2601.18827v1 Announce Type: cross Abstract: LLM-based agents are rapidly being adopted across diverse domains. Since they interact with users without supervision, they must be tested extensively. Current testing approaches focus on acceptance-level evaluation from the user’s perspective. While intuitive, these tests require manual evaluation, are difficult to automate, do not facilitate root cause analysis, and […]
CanaryBench: Stress Testing Privacy Leakage in Cluster-Level Conversation Summaries
arXiv:2601.18834v1 Announce Type: cross Abstract: Aggregate analytics over conversational data are increasingly used for safety monitoring, governance, and product analysis in large language model systems. A common practice is to embed conversations, cluster them, and publish short textual summaries describing each cluster. While raw conversations may never be exposed, these derived summaries can still pose […]