Assessing nurses’ attitudes toward artificial intelligence in Kazakhstan: psychometric validation of a nine-item scale

BackgroundArtificial intelligence (AI) is increasingly integrated into healthcare, yet the attitudes and knowledge of nurses, who are the key mediators of AI implementation, remain underexplored.

Identifying needs in adult rehabilitation to support the clinical implementation of robotics and allied technologies: an Italian national survey

IntroductionRobotics and technological interventions are increasingly being explored as solutions to improve rehabilitation outcomes but their implementation in clinical practice remains very limited. Understanding patient

Lightweight liquid neural networks decipher salivary metabolic fingerprinting for high-risk periodontitis screening in diabetes

npj Digital Medicine, Published online: 07 April 2026; doi:10.1038/s41746-026-02593-7 Lightweight liquid neural networks decipher salivary metabolic fingerprinting for high-risk periodontitis screening in diabetes

Supporting Access to Care Through Peripheral Devices and Patient-Generated Health Data: Qualitative Study

Background: In 2016, the US Department of Veterans Affairs (VA) implemented a national initiative to distribute video-enabled tablets and peripheral devices, such as blood pressure

As Social Media Scales Back Fact-Checking, Can Technologies Fill the Gap?

Post Content

ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation

April 1, 2026

arXiv:2603.29902v1 Announce Type: new
Abstract: Interleaved text-and-image generation represents a significant frontier for Multimodal Large Language Models (MLLMs), offering a more intuitive way to convey complex information. Current paradigms rely on either image generation or retrieval augmentation, yet they typically treat the two as mutually exclusive paths, failing to unify factuality with creativity. We argue that the next milestone in this field is Agentic Tool Planning, where the model serves as a central controller that autonomously determines when, where, and which tools to invoke to produce interleaved responses for visual-critical queries. To systematically evaluate this paradigm, we introduce ATP-Bench, a novel benchmark comprising 7,702 QA pairs (including 1,592 VQA pairs) across eight categories and 25 visual-critical intents, featuring human-verified queries and ground truths. Furthermore, to evaluate agentic planning independent of end-to-end execution and changing tool backends, we propose a Multi-Agent MLLM-as-a-Judge (MAM) system. MAM evaluates tool-call precision, identifies missed opportunities for tool use, and assesses overall response quality without requiring ground-truth references. Our extensive experiments on 10 state-of-the-art MLLMs reveal that models struggle with coherent interleaved planning and exhibit significant variations in tool-use behavior, highlighting substantial room for improvement and providing actionable guidance for advancing interleaved generation. Dataset and code are available at https://github.com/Qwen-Applications/ATP-Bench.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844