arXiv:2604.10857v1 Announce Type: cross Abstract: Diffusion models generate samples by iteratively querying learned score estimates. A rapidly growing literature focuses on accelerating sampling by minimizing the number of score evaluations, yet the information-theoretic limits of such acceleration remain unclear. In this work, we establish the first score query lower bounds for diffusion sampling. We prove […]
MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models
arXiv:2604.10971v1 Announce Type: cross Abstract: In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning […]
ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values
arXiv:2604.11200v1 Announce Type: cross Abstract: Changes in input distribution can induce shifts in the average predictions of machine learning models. Such prediction shifts may impact downstream business outcomes (e.g. a bank’s loan approval rate), so understanding their causes can be crucial. We propose ours: a Shapley value method for attributing prediction shifts to changes in […]
Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations
arXiv:2604.11322v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated impressive capabilities in utilizing external tools. In practice, however, LLMs are often exposed to tools that are irrelevant to the user’s query, in which case the desired behavior is to refrain from invocations. In this work, we identify a widespread yet overlooked mechanistic flaw […]
ADD for Multi-Bit Image Watermarking
arXiv:2604.11491v1 Announce Type: cross Abstract: As generative models enable rapid creation of high-fidelity images, societal concerns about misinformation and authenticity have intensified. A promising remedy is multi-bit image watermarking, which embeds a multi-bit message into an image so that a verifier can later detect whether the image is generated by someone and further identify the […]
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
arXiv:2604.09574v1 Announce Type: new Abstract: The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critical dimension of anti-detection. We argue that for agents to survive in human-centric ecosystems, they must evolve Humanization capabilities. We introduce the “Turing Test on Screen,” formally modeling […]
GenTac: Generative Modeling and Forecasting of Soccer Tactics
arXiv:2604.11786v1 Announce Type: new Abstract: Modeling open-play soccer tactics is a formidable challenge due to the stochastic, multi-agent nature of the game. Existing computational approaches typically produce single, deterministic trajectory forecasts or focus on highly structured set-pieces, fundamentally failing to capture the inherent variance and branching possibilities of real-world match evolution. Here, we introduce GenTac, […]
SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context
arXiv:2604.11716v1 Announce Type: new Abstract: Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to the multi-turn SWE task creates a fundamental dilemma: retaining full reasoning history […]
Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games
arXiv:2604.11741v1 Announce Type: new Abstract: Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden truths based on partial clues provided by roles […]
The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading
arXiv:2604.03501v2 Announce Type: replace-cross Abstract: Experimental evidence confirms that AI tools raise worker productivity, but also that sustained use can erode the expertise on which those gains depend. We develop a dynamic model in which a decision-maker chooses AI usage intensity for a worker over time, trading immediate productivity against the erosion of worker skill. […]
CASK: Core-Aware Selective KV Compression for Reasoning Traces
arXiv:2604.10900v1 Announce Type: new Abstract: In large language models performing long-form reasoning, the KV cache grows rapidly with decode length, creating bottlenecks in memory and inference stability. Existing reasoning-oriented KV compression has mostly followed an eviction-centered view: estimate token importance more accurately, then discard lower-ranked entries. Our analysis suggests that scorer refinement alone often fails […]
ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks
arXiv:2604.10981v1 Announce Type: new Abstract: ATANT v1.0 (arXiv:2604.06710) defined continuity as a system property with 7 required properties and introduced a 10-checkpoint, LLM-free evaluation methodology validated on a 250-story corpus. Since publication, a recurring reviewer and practitioner question has concerned not the framework itself but its relationship to a wider set of memory evaluations: LOCOMO, […]