arXiv:2605.23867v1 Announce Type: cross Abstract: Large language models (LLMs) have the potential to aid and improve human decision-making in classification tasks, not only by providing fairly accurate predictions, but also in their ability to generate cogent narrative explanations of those predictions. Prior work has demonstrated that people generally find AI narrative explanations to be understandable, […]
Variance Reduction for Expectations with Diffusion Teachers
arXiv:2605.21489v2 Announce Type: replace-cross Abstract: Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, […]
Weierstrass Positional Encoding for Vision Transformers
arXiv:2605.23719v1 Announce Type: cross Abstract: Vision Transformers have achieved remarkable success in computer vision, but their common use of learnable one-dimensional positional encodings weakens the inherent two-dimensional spatial structure of images after patch flattening. Existing positional encodings often lack geometric constraints and do not preserve a monotonic relationship between Euclidean spatial distances and sequential index […]
Proxy-Based Approximation of Shapley and Banzhaf Interactions
arXiv:2605.22738v2 Announce Type: replace-cross Abstract: Shapley and Banzhaf interactions capture the complex dynamics inherent in modern machine learning applications. However, current estimators for these higher-order interactions trade off between speed and accuracy. To overcome this limitation, we introduce ProxySHAP. ProxySHAP reconciles the high sample efficiency of tree-based proxy models with a principled path to consistency […]
Detecting and Correcting Sample-by-Sample Scale Distortion in RNA Sequencing Data
arXiv:2605.22838v1 Announce Type: new Abstract: RNA sequencing (RNA-seq) is the conventional genome-scale approach used to capture the expression levels of all detectable genes in a biological sample. This is now regularly used for population-based studies designed to identify genetic determinants of various diseases. Naturally, the accuracy of these tests should be verified and improved if […]
Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models
arXiv:2605.22902v1 Announce Type: cross Abstract: Generative Vision-Language Models (VLMs) perform well on multimodal reasoning, but how visual inputs are transformed to text remains poorly understood. Existing interpretability work on VLMs uses Sparse Autoencoders (SAEs), which decompose static residual representations and miss the functional updates that drive cross-modal interaction. We adopt a function-centric framework based on […]
DrawVideo: Generating Long Video from Storyboard Keyframe Sketches
arXiv:2605.23508v1 Announce Type: cross Abstract: Long video generation requires high-fidelity synthesis, coherent narrative structure, and user control over extended time spans. Existing text-to-video methods often rely on a single long prompt, limiting control over pose, composition, layout, and motion. We propose DrawVideo, a sketch-guided, storyboard-driven framework for controllable long-video generation. DrawVideo decomposes long videos into […]
Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision?
arXiv:2605.22903v1 Announce Type: cross Abstract: Benchmark accuracy is often implicitly assumed to reflect grounded visual understanding in vision-language models (VLMs), yet it remains unclear to what extent such scores truly reflect reliance on visual evidence. Motivated by a surprising observation that removing a substantial fraction of image tokens only degrades model performance very slightly on […]
LLM Code Smells: A Taxonomy and Detection Approach
arXiv:2605.22976v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM integration coding practices must be documented […]
TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning
arXiv:2601.21692v2 Announce Type: replace Abstract: Fine-Tuning-as-a-Service (FTaaS) facilitates the customization of Multimodal Large Language Models (MLLMs) but introduces critical backdoor risks via poisoned data. Existing defenses either rely on supervised signals or fail to generalize across diverse trigger types and modalities. In this work, we uncover a universal backdoor fingerprint-attention allocation divergence-where poisoned samples disrupt […]
How Mobile World Model Guides GUI Agents?
arXiv:2605.10347v2 Announce Type: replace Abstract: Recent advances in vision-language models have enabled mobile GUI agents to perceive visual interfaces and execute user instructions, but reliable prediction of action consequences remains critical for long-horizon and high-risk interactions. Existing mobile world models provide either text-based or image-based future states, yet it remains unclear which representation is useful, […]
Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection
arXiv:2605.23623v1 Announce Type: cross Abstract: We present a longitudinal, drift-aware evaluation of adversarial robustness across more than a decade of Android applications using static and dynamic feature representations extracted from emulator and real-device executions. The dataset is organized into yearly slices and evaluated under three deployment protocols that emulate realistic learning scenarios: (1) same-year training […]