Portable automated rapid testing for auditory assessment: repeated at-home testing in older adults

IntroductionHearing challenges are prevalent in older adults and are associated with age-related cognitive decline. However, measuring age-related changes in hearing faces critical barriers related to

Why digital health fails silently: a sociotechnical theory of health information technology–related risk

IntroductionHealth information technology (HIT) is now integral to healthcare delivery, supporting clinical documentation, prescribing, diagnostics, and care coordination. Although these technologies offer substantial benefits, they

Why health information technology safety problems remain invisible

Post Content

Understanding the value of virtual care technologies: development of a framework in the veterans health administration

IntroductionHealthcare systems, including the Veterans Health Administration (VHA), are facing tremendous growth in virtual care technologies that are intended to foster connections between patients, informal

Human-supervised, large language model-based clinical decision support aligned to national newborn protocols in Kenya: a pragmatic, early-stage evaluation

IntroductionTimely, protocol-adherent clinical decisions are crucial for reducing neonatal mortality in low-resource settings. Translating extensive national guidelines into bedside practice remains challenging.ObjectiveWe developed and evaluated

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

May 26, 2026

arXiv:2605.01284v2 Announce Type: replace-cross
Abstract: Iterative Retrieval-Augmented Generation (iRAG) has emerged as a powerful paradigm for answering complex multi-hop questions by progressively retrieving and reasoning over external documents. However, current systems predominantly operate on parsed text, which creates two critical bottlenecks: (1) textitCoarse-grained attribution, where users are burdened with manually locating evidence within lengthy documents based on vague text-level citations; and (2) textitVisual semantic loss, where the conversion of visually rich documents (e.g., slides, PDFs with charts) into text discards spatial logic and layout cues essential for reasoning. To bridge this gap, we present textbfChain of Evidence (CoE), a retriever-agnostic visual attribution framework that leverages Vision-Language Models to reason directly over screenshots of retrieved document candidates. CoE eliminates format-specific parsing and outputs precise bounding boxes, visualizing the complete reasoning chain within the retrieved candidate set. We evaluate CoE on two distinct benchmarks: textbfWiki-CoE, a large-scale dataset of structured web pages derived from 2WikiMultiHopQA, and textbfSlideVQA, a challenging dataset of presentation slides featuring complex diagrams and free-form layouts. Experiments demonstrate that fine-tuned Qwen3-VL-8B-Instruct achieves robust performance, significantly outperforming text-based baselines in scenarios requiring visual layout understanding, while establishing a retriever-agnostic solution for pixel-level interpretable iRAG. Our code is available at https://github.com/PeiYangLiu/CoE.git.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844