arXiv:2603.17217v1 Announce Type: cross Abstract: Responsible use of AI demands that we protect sensitive information without undermining the usefulness of data, an imperative that has become acute in the age of large language models. We address this challenge with an on-premise, LLM-driven substitution pipeline that anonymizes text by replacing personally identifiable information (PII) with realistic, […]
From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMs
arXiv:2603.17228v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are increasingly applied to pixel-level vision tasks, yet their intrinsic capacity for spatial understanding remains poorly understood. We investigate segmentation capacity through a layerwise linear probing evaluation across the entire MLLM pipeline: vision encoder, adapter, and LLM. We further conduct an intervention based attention knockout […]
Pathology-Aware Multi-View Contrastive Learning for Patient-Independent ECG Reconstruction
arXiv:2603.17248v1 Announce Type: cross Abstract: Reconstructing a 12-lead electrocardiogram (ECG) from a reduced lead set is an ill-posed inverse problem due to anatomical variability. Standard deep learning methods often ignore underlying cardiac pathology losing vital morphology in precordial leads. We propose Pathology-Aware Multi-View Contrastive Learning, a framework that regularizes the latent space through a pathological […]
From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation
arXiv:2603.17303v1 Announce Type: cross Abstract: Culture-expressions, such as idioms, slang, and culture-specific items (CSIs), are pervasive in natural language and encode meanings that go beyond literal linguistic form. Accurately translating such expressions remains challenging for machine translation systems. Despite this, existing benchmarks remain fragmented and do not provide a systematic framework for evaluating translation performance […]
Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress
arXiv:2603.17312v1 Announce Type: cross Abstract: Accurately estimating task progress is critical for embodied agents to plan and execute long-horizon, multi-step tasks. Despite promising advances, existing Vision-Language Models (VLMs) based methods primarily leverage their video understanding capabilities, while neglecting their complex reasoning potential. Furthermore, processing long video trajectories with VLMs is computationally prohibitive for real-world deployment. […]
Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift
arXiv:2603.17372v1 Announce Type: cross Abstract: Large vision-language models (VLMs) often exhibit weakened safety alignment with the integration of the visual modality. Even when text prompts contain explicit harmful intent, adding an image can substantially increase jailbreak success rates. In this paper, we observe that VLMs can clearly distinguish benign inputs from harmful ones in their […]
Transformers are Bayesian Networks
arXiv:2603.17063v1 Announce Type: new Abstract: Transformers are the dominant architecture in AI, yet why they work remains poorly understood. This paper offers a precise answer: a transformer is a Bayesian network. We establish this in five ways. First, we prove that every sigmoid transformer with any weights implements weighted loopy belief propagation on its implicit […]
Real-World AI Evaluation: How FRAME Generates Systematic Evidence to Resolve the Decision-Maker’s Dilemma
arXiv:2603.13294v3 Announce Type: replace-cross Abstract: The rapid expansion of AI deployments has put organizational leaders in a decision maker’s dilemma: they must govern these technologies without systematic evidence of how systems behave in their own environments. Predominant evaluation methods generate scalable, abstract measures of model capabilities but smooth over the heterogeneity of real world use, […]
When the Specification Emerges: Benchmarking Faithfulness Loss in Long-Horizon Coding Agents
arXiv:2603.17104v1 Announce Type: cross Abstract: Current coding-agent benchmarks usually pro- vide the full task specification upfront. Real research coding often does not: the intended system is progressively disclosed through in- teraction, requiring the agent to track durable design commitments across a long session. We introduce a benchmark for this setting and study faithfulne Ss Loss […]
Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning
arXiv:2603.17174v1 Announce Type: cross Abstract: Code generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce the generation of insecure code, yet effective defenses remain limited. Existing scanning approaches rely on token-level generation consistency to invert […]
Shared Representation Learning for Reference-Guided Targeted Sound Detection
arXiv:2603.17025v1 Announce Type: cross Abstract: Human listeners exhibit the remarkable ability to segregate a desired sound from complex acoustic scenes through selective auditory attention, motivating the study of Targeted Sound Detection (TSD). The task requires detecting and localizing a target sound in a mixture when a reference audio of that sound is provided. Prior approaches, […]
Revisiting Cross-Attention Mechanisms: Leveraging Beneficial Noise for Domain-Adaptive Learning
arXiv:2603.17474v1 Announce Type: cross Abstract: Unsupervised Domain Adaptation (UDA) seeks to transfer knowledge from a labeled source domain to an unlabeled target domain but often suffers from severe domain and scale gaps that degrade performance. Existing cross-attention-based transformers can align features across domains, yet they struggle to preserve content semantics under large appearance and scale […]