arXiv:2502.20403v2 Announce Type: replace-cross Abstract: Adversarial robustness in quantum classifiers is a critical area of study, providing insights into their performance compared to classical models and uncovering potential advantages inherent to quantum machine learning. In the NISQ era of quantum computing, circuit cutting is a notable technique for simulating circuits that exceed the qubit limitations […]
TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation
arXiv:2603.06057v1 Announce Type: cross Abstract: Diffusion models have recently advanced photorealistic human synthesis, although practical talking-head generation (THG) remains constrained by high inference latency, temporal instability such as flicker and identity drift, and imperfect audio-visual alignment under challenging speech conditions. This paper introduces TempoSyncDiff, a reference-conditioned latent diffusion framework that explores few-step inference for efficient […]
Text-Driven Emotionally Continuous Talking Face Generation
arXiv:2603.06071v1 Announce Type: cross Abstract: Talking Face Generation (TFG) strives to create realistic and emotionally expressive digital faces. While previous TFG works have mastered the creation of naturalistic facial movements, they typically express a fixed target emotion in synthetic videos and lack the ability to exhibit continuously changing and natural expressions like humans do when […]
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents
arXiv:2505.05283v3 Announce Type: replace-cross Abstract: Code large language models (CodeLLMs) and agents are increasingly being integrated into complex software engineering tasks spanning the entire Software Development Life Cycle (SDLC). Benchmarking is critical for rigorously evaluating these capabilities. However, despite their growing significance, there remains a lack of comprehensive reviews that examine these benchmarks from an […]
StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation
arXiv:2603.06079v1 Announce Type: cross Abstract: We address the challenge of preserving emotional content in streaming speaker anonymization (SA). Neural audio codec language models trained for audio continuation tend to degrade source emotion: content tokens discard emotional information, and the model defaults to dominant acoustic patterns rather than preserving paralinguistic attributes. We propose supervised finetuning with […]
Cultural Perspectives and Expectations for Generative AI: A Global Survey Approach
arXiv:2603.05723v1 Announce Type: cross Abstract: There is a lack of empirical evidence about global attitudes around whether and how GenAI should represent cultures. This paper assesses understandings and beliefs about culture as it relates to GenAI from a large-scale global survey. We gathered data about what culture means to different groups, and about how GenAI […]
Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection
arXiv:2602.11247v2 Announce Type: replace-cross Abstract: Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is evaluated independently. While single-turn detection has been extensively studied, no published formula exists for aggregating per-turn pattern scores into a conversation-level risk score at the proxy layer — without invoking an LLM. […]
Stem: Rethinking Causal Information Flow in Sparse Attention
arXiv:2603.06274v1 Announce Type: cross Abstract: The quadratic computational complexity of self-attention remains a fundamental bottleneck for scaling Large Language Models (LLMs) to long contexts, particularly during the pre-filling phase. In this paper, we rethink the causal attention mechanism from the perspective of information flow. Due to causal constraints, tokens at initial positions participate in the […]
Neural Signals Generate Clinical Notes in the Wild
arXiv:2601.22197v2 Announce Type: replace-cross Abstract: Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive. We curate a large-scale clinical EEG dataset with $9,922$ reports paired with approximately $11,000$ hours of EEG recordings from $9,048$ patients. We therefore develop CELM, the first clinical EEG-to-Language foundation model capable […]
CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion
arXiv:2512.19535v2 Announce Type: replace-cross Abstract: Vision-language models (VLMs) are commonly trained by directly inserting image tokens from a pretrained vision encoder into the text stream of a language model. This allows text and image information to fully attend to one another within the model, but becomes rapidly costly for long multi-image conversations or streaming video […]
AdAEM: An Adaptively and Automated Extensible Measurement of LLMs’ Value Difference
arXiv:2505.13531v2 Announce Type: replace-cross Abstract: Assessing Large Language Models'(LLMs) underlying value differences enables comprehensive comparison of their misalignment, cultural adaptability, and biases. Nevertheless, current value measurement methods face the informativeness challenge: with often outdated, contaminated, or generic test questions, they can only capture the orientations on comment safety values, e.g., HHH, shared among different LLMs, […]
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
arXiv:2510.11512v3 Announce Type: replace-cross Abstract: Intuitive physics understanding in video diffusion models plays an essential role in building general-purpose physically plausible world simulators, yet accurately evaluating such capacity remains a challenging task due to the difficulty in disentangling physics correctness from visual appearance in generation. To the end, we introduce LikePhys, a training-free method that […]