To Call or Not to Call: Diagnosing Intrinsic Over-Calling Bias in LLM Agents

arXiv:2605.18882v1 Announce Type: cross Abstract: LLM agents exhibit a consistent tendency to over-call, invoking tools even in situations where none is needed. On the When2Call benchmark, six models from three families show high call accuracy but much lower no-call accuracy, leaving overall accuracy in the 55%-70% range. We trace this to an Intrinsic Bias Hypothesis […]

k-Inductive Neural Barrier Certificates for Unknown Nonlinear Dynamics

arXiv:2605.20108v1 Announce Type: cross Abstract: While conventional (k=1) discrete-time barrier certificate conditions impose strict safety constraints by requiring the function to be non-increasing at every step, k-inductive barrier certificates relax this by allowing a temporary increase — up to k-1 times, each within a threshold $epsilon$ — while maintaining overall safety, and improving flexibility. This […]

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

arXiv:2605.18890v1 Announce Type: cross Abstract: The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power to agent-based modeling, enabling simulations of collective social processes like cooperation, polarization, and norm formation. Yet they also introduce complexity through additional architectural choices, such […]

Hallucination as Exploit: Evidence-Carrying Multimodal Agents

arXiv:2605.19192v1 Announce Type: new Abstract: Multimodal agents use screenshots, documents, and webpages to choose tool calls. When a false visual claim triggers a click, email, extraction, or transfer, hallucination becomes an authorization failure rather than an answer-quality error. We formalize this failure mode as hallucination-to-action conversion: an unsupported perceptual claim supplies the precondition that makes […]

Cross-Subject Intracranial EEG Reconstruction from Scalp Recordings Using Multi-Scale Cross-Attention Transformers

arXiv:2605.18897v1 Announce Type: cross Abstract: Intracranial EEG (iEEG) provides high-fidelity neural recordings essential for clinical and brain-computer interface applications, but acquiring these signals requires invasive surgery. While recent studies have attempted to estimate iEEG from non-invasive scalp EEG, most rely on patient-specific models, creating a circular dependency: if surgery is required to collect training data, […]

BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction

arXiv:2510.16559v5 Announce Type: replace Abstract: Engineering construction automation aims to transform natural language specifications into physically viable structures, requiring complex integrated reasoning under strict physical constraints. While modern LLMs possess broad knowledge and strong reasoning capabilities that make them promising candidates for this domain, their construction competencies remain largely unevaluated. To address this gap, we […]

Lightweight and Fast Backdoor Model Detection

arXiv:2605.18907v1 Announce Type: cross Abstract: Deep neural networks (DNN), despite their remarkable performance, are highly vulnerable to backdoor attacks. Existing defenses mainly rely on activation anomaly analysis or trigger reverse engineering and often require clean samples or prior knowledge of trigger patterns, resulting in limited efficacy, practicability, and generalizability. More critically, while advanced attacks can […]

Not all uncertainty is alike: volatility, stochasticity, and exploration

arXiv:2605.19215v1 Announce Type: new Abstract: Adaptive decision-making in biological and artificial intelligence requires balancing the exploitation of known outcomes with the exploration of uncertain alternatives. Although prior work suggests that uncertainty generally promotes exploration, it has typically treated distinct sources of environmental uncertainty as equivalent. We consider environments with latent reward states that drift over […]

DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs

arXiv:2605.18915v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks, which can elicit harmful responses from MLLMs. Many MLLMs support multi-image inputs, inadvertently introducing new vulnerabilities due to less efforts on multi-image safety alignment. Previous MLLM jailbreak methods only uses a single image, which restricts the attack space: they cannot […]

Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations

arXiv:2603.17305v2 Announce Type: replace Abstract: We propose CRAFT, a red-teaming alignment framework that leverages model reasoning capabilities and hidden representations to improve robustness against jailbreak attacks. Unlike prior defenses that operate primarily at the output level, CRAFT aligns large reasoning models to generate safety-aware reasoning traces by explicitly optimizing objectives defined over the hidden state […]

SynGR: Unleashing the Potential of Cross-Modal Synergy for Generative Recommendation

arXiv:2605.18920v1 Announce Type: cross Abstract: Generative Recommendation (GR) has emerged as a promising paradigm by formulating item recommendation as a sequence-to-sequence generation task over item identifiers. Recent studies have incorporated multimodal signals to provide richer token-level evidence for generation. However, existing approaches largely rely on alignment-centric fusion and underexplore synergistic information across modalities. In practice, […]

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

arXiv:2605.19219v1 Announce Type: new Abstract: A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844