Why Agents Compromise Safety Under Pressure

arXiv:2603.14975v1 Announce Type: new Abstract: Large Language Model agents deployed in complex environments frequently encounter a conflict between maximizing goal achievement and adhering to safety constraints. This paper identifies a new concept called Agentic Pressure, which characterizes the endogenous tension emerging when compliant execution becomes infeasible. We demonstrate that under this pressure agents exhibit normative […]

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

arXiv:2603.15030v1 Announce Type: new Abstract: Recent advancements extend Multimodal Large Language Models (MLLMs) beyond standard visual question answering to utilizing external tools for advanced visual tasks. Despite this progress, precisely executing and effectively composing diverse tools for complex tasks remain persistent bottleneck. Constrained by sparse tool-sets and simple tool-use trajectories, existing benchmarks fail to capture […]

PrototypeNAS: Rapid Design of Deep Neural Networks for Microcontroller Units

arXiv:2603.15106v1 Announce Type: new Abstract: Enabling efficient deep neural network (DNN) inference on edge devices with different hardware constraints is a challenging task that typically requires DNN architectures to be specialized for each device separately. To avoid the huge manual effort, one can use neural architecture search (NAS). However, many existing NAS methods are resource-intensive […]

A multiscale discrete-to-continuum framework for structured population models

arXiv:2603.15217v1 Announce Type: new Abstract: Mathematical models of biological populations commonly use discrete structure classes to capture trait variation among individuals (e.g. age, size, phenotype, intracellular state). Upscaling these discrete models into continuum descriptions can improve analytical tractability and scalability of numerical solutions. Common upscaling approaches based solely on Taylor expansions may, however, introduce ambiguities […]

SAGE: Multi-Agent Self-Evolution for LLM Reasoning

arXiv:2603.15255v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards improves reasoning in large language models (LLMs), but many methods still rely on large human-labeled datasets. While self-play reduces this dependency, it often lacks explicit planning and strong quality control, limiting stability in long-horizon multi-step reasoning. We present SAGE (Self-evolving Agents for Generalized reasoning Evolution), […]

Algorithms for Deciding the Safety of States in Fully Observable Non-deterministic Problems: Technical Report

arXiv:2603.15282v1 Announce Type: new Abstract: Learned action policies are increasingly popular in sequential decision-making, but suffer from a lack of safety guarantees. Recent work introduced a pipeline for testing the safety of such policies under initial-state and action-outcome non-determinism. At the pipeline’s core, is the problem of deciding whether a state is safe (a safe […]

PMAx: An Agentic Framework for AI-Driven Process Mining

arXiv:2603.15351v1 Announce Type: new Abstract: Process mining provides powerful insights into organizational workflows, but extracting these insights typically requires expertise in specialized query languages and data science tools. Large Language Models (LLMs) offer the potential to democratize process mining by enabling business users to interact with process data through natural language. However, using LLMs as […]

A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning

arXiv:2603.15411v1 Announce Type: new Abstract: Accurate prediction of crop states (e.g., phenology stages and cold hardiness) is essential for timely farm management decisions such as irrigation, fertilization, and canopy management to optimize crop yield and quality. While traditional biophysical models can be used for season-long predictions, they lack the precision required for site-specific management. Deep […]

Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis

arXiv:2603.15483v1 Announce Type: new Abstract: Agent applications are increasingly adopted to automate workflows across diverse tasks. However, due to the heterogeneous domains they operate in, it is challenging to create a scalable evaluation framework. Prior works each employ their own methods to determine task success, such as database lookups, regex match, etc., adding complexity to […]

Topology-Preserving Data Augmentation for Ring-Type Polygon Annotations

arXiv:2603.14764v1 Announce Type: cross Abstract: Geometric data augmentation is widely used in segmentation pipelines and typically assumes that polygon annotations represent simply connected regions. However, in structured domains such as architectural floorplan analysis, ring-type regions are often encoded as a single cyclic polygon chain connecting outer and inner boundaries. During augmentation, clipping operations may remove […]

MURE: Hierarchical Multi-Resolution Encoding via Vision-Language Models for Visual Document Retrieval

arXiv:2603.13349v1 Announce Type: cross Abstract: Visual Document Retrieval (VDR) requires representations that capture both fine-grained visual details and global document structure to ensure retrieval efficacy while maintaining computational efficiency. Existing VDR models struggle to balance effectiveness and efficiency when processing high-resolution documents: they often either lose fine-grained information or generate an excessive number of visual […]

EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation

arXiv:2603.09465v2 Announce Type: replace-cross Abstract: Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844