arXiv:2605.27068v1 Announce Type: cross Abstract: Social deduction games have become a popular testbed for probing reasoning, deception, coordination, and belief modeling in Large Language Model (LLM) agents. However, most environments are scored only by game outcomes such as win rates and largely remain to text-only interaction, making it difficult to tell whether an agent’s language […]
Eroding Trust in Real Speech: A Large-Scale Study of Human Audio Deepfake Perception
arXiv:2605.26136v1 Announce Type: cross Abstract: Audio deepfakes have improved rapidly recently, yet their effect on human trust in real speech remains unstudied. We present the largest listening study on audio deepfake perception to date, collecting 35,532 judgments from 1,768 participants across 138 text-to-speech and voice conversion systems. Our central finding is a skepticism shift: compared […]
CitePrism: Human-in-the-Loop AI for Citation Auditing and Editorial Integrity
arXiv:2605.16000v2 Announce Type: replace-cross Abstract: Editors and reviewers are expected to ensure that manuscripts cite relevant, accurate, current, and ethically appropriate literature, yet manuscript-level citation auditing remains largely manual, fragmented, and difficult to scale. Citation context, metadata quality, self-citation patterns, and bibliographic integrity all affect whether a reference appropriately supports a local claim. We present […]
MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning
arXiv:2605.26154v1 Announce Type: cross Abstract: LLM-driven agents are capable of selecting external tools to complete users’ tasks. However, attackers could compromise such process, steering agents toward inappropriate/wrong tools and enabling malicious actions. Most existing attacks primarily manipulate the tool metadata, which is easily detectable by auditing and may lose effectiveness as modern agents increasingly adopt […]
ConVer: Using Contracts and Loop Invariant Synthesis for Scalable Formal Software Verification
arXiv:2605.27051v1 Announce Type: cross Abstract: Formal verification of large C programs is impeded by state-space explosion: Bounded Model Checking (BMC) tools must encode the entire state space up to the predetermined bound by unrolling all nested constructs. We present ConVer, a top-down compositional verification tool. Given a C program with a top-level assertion, ConVer decomposes […]
TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models
arXiv:2605.26161v1 Announce Type: cross Abstract: Time series foundation models (TSFMs) are increasingly pretrained on large corpora, raising concerns that evaluation datasets may have been exposed during pretraining and thus yield overly optimistic performance estimates. Auditing such contamination is challenging in time series because signals are continuous and heterogeneous, and often lack corpus documentation. To the […]
Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict
arXiv:2605.14473v3 Announce Type: replace-cross Abstract: The Context-Compliance Regime in Retrieval-Augmented Generation (RAG) occurs when retrieved context dominates the final answer even when it conflicts with the model’s parametric knowledge. Accuracy alone does not reveal how retrieved context causally shapes answers under such conflict. We introduce Context-Driven Decomposition (CDD), a belief-decomposition probe that operates at inference […]
Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning
arXiv:2605.26167v1 Announce Type: cross Abstract: We propose Lie group embedded dynamical neural networks (LieEDNN) and the corresponding learning algorithms based on gradient descent and metric projection on smooth manifold, where we treat Lie group as an intrinsic representation for continuous symmetry of manifold geometry. Thereby we achieve learnable and stable dynamics on the underlying manifold […]
Lessons from Penetration Tests on Large-Scale Agent Systems
arXiv:2605.27042v1 Announce Type: cross Abstract: As AI systems gain increasing autonomy and execution capability, the number of discovered security vulnerabilities continues to rise. However, many of these vulnerabilities are not fundamentally novel, but instead reflect recurring classes of weaknesses long observed in prior computing systems. Execution-capable AI agents are effectively unbounded, self-modifying programs that interact […]
RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations
arXiv:2605.26177v1 Announce Type: cross Abstract: Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository context reasoning, the ability to identify the task-relevant information across multiple files and reason over the relations among them. To investigate this […]
GraphIP-Bench: How Hard Is It to Steal a Graph Neural Network, and Can We Stop It?
arXiv:2605.12827v2 Announce Type: replace-cross Abstract: Graph neural networks (GNNs) deployed as cloud services can be stolen through model-extraction attacks, which train a surrogate from query responses to reproduce the target’s behavior, and a growing line of ownership defenses tries to prevent or trace such theft. This paper asks two questions: how hard is it to […]
Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training
arXiv:2605.26189v1 Announce Type: cross Abstract: Quantization-aware training (QAT) with low-bit floating-point formats enables efficient LLM deployment, yet introduces subtle failure modes invisible to standard training metrics. We present a systematic study of HiF8 W8A8 QAT for OpenPangu-Embedded-1B through the lens of Delayed Tensor Scaling (DTS). Across eight controlled experiments, we identify and disentangle two orthogonal […]