Explainable AI in kidney stone detection and segmentation: a mini review

Kidney stones are one of the most common renal disorders that can produce severe complications if not diagnosed and treated early. Recently, advances in AI

Feasibility testing of a home-based exercise intervention in children with cerebral palsy who are ambulant—a study protocol of the HOME-EX study

Children gain increased health and well-being by participating in physical activity. Children with cerebral palsy who are ambulatory (CP-A) are known to be less physically

Patient and clinician perceptions, expectations, and usability of ankle exoskeletons for daily living: a mixed-methods survey study

Ankle exoskeletons offer promising support for individuals with chronic foot drop, yet user and clinician perspectives on their use in daily living remain underexplored. Related

Why digital health fails silently: a sociotechnical theory of health information technology–related risk

IntroductionHealth information technology (HIT) is now integral to healthcare delivery, supporting clinical documentation, prescribing, diagnostics, and care coordination. Although these technologies offer substantial benefits, they

A maturity model framework for federated networks of trusted research environments

IntroductionA Trusted Research Environment (TRE) is a highly secure computer system where sensitive data is stored that researchers can access remotely and make use of

ChromaFlow: A Negative Ablation Study of Orchestration Overhead in Tool-Augmented Agent Evaluation

May 20, 2026

arXiv:2605.14102v2 Announce Type: replace
Abstract: Autonomous language-model agents increasingly combine planning, tool use, document processing, browsing, code execution, and verification loops. These capabilities make agent systems more useful, but they also introduce operational failure modes that are not visible from final accuracy alone. This report presents ChromaFlow, a tool-augmented autonomous reasoning framework built around planner-directed execution, specialized tool use, and telemetry-driven evaluation. We analyze ChromaFlow on GAIA 2023 Level-1 validation tasks under clean evaluation constraints. A frozen full Level-1 baseline achieved 29/53 correct answers, or 54.72%. A later recovery configuration with expanded orchestration achieved 27/53 correct answers, or 50.94%, while increasing tracebacks, timeout events, tool-failure mentions, token-log calls, and campaign-log cost estimates. Two randomized 20-task smoke evaluations produced 12/20 and 11/20 correct answers, showing that small diagnostic gains can be unstable across samples. The central result is therefore a negative ablation: more aggressive orchestration did not improve full-set performance and increased operational noise. A later strict-provider full-Level-1 diagnostic reached 30/53, or 56.60%, under explicit integrity controls, but at substantially higher token-log cost. The report argues that bounded planner escalation, deterministic extraction, evidence reconciliation, provider-health gates, and explicit run gates should be treated as first-order requirements for reliable autonomous agent evaluation.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844