Unlocking electronic health records: a hybrid graph RAG approach to safe clinical AI for patient QA

IntroductionElectronic health record (EHR) systems present clinicians with vast repositories of clinical information, creating a significant cognitive burden where critical details are easily overlooked. While

Virtual reality in treatment of psychological disorders: a systematic review

ObjectiveThe paper aims to systematically review the literature on the efficacy of virtual reality (VR) based therapies to treat mental health disorders in Randomized Control

Through the looking glass: ethical considerations regarding LLM-induced hallucinations to medical questions

Post Content

Real-world federated learning for brain imaging scientists

BackgroundFederated learning (FL) has the potential to boost deep learning in neuroimaging but is rarely deployed in real-world scenarios, where its true potential lies. We

Trust and anxiety as primary drivers of digital health acceptance in multiple sclerosis: toward an extended disease-specific technology acceptance model

BackgroundDigital health applications and AI-supported wearables may benefit people with Multiple Sclerosis (MS), yet fluctuating cognitive and physical symptoms could shape adoption in ways not

Multi-Agent LLMs for Generating Research Limitations

March 17, 2026

arXiv:2601.11578v2 Announce Type: replace-cross
Abstract: Identifying and articulating limitations is essential for transparent and rigorous scientific research. However, zero-shot large language models (LLMs) approach often produce superficial or general limitation statements (e.g., dataset bias or generalizability). They usually repeat limitations reported by authors without looking at deeper methodological issues and contextual gaps. This problem is made worse because many authors disclose only partial or trivial limitations. We propose, a multi-agent LLM framework for generating substantive limitations. It integrates OpenReview comments and author-stated limitations to provide stronger ground truth. It also uses cited and citing papers to capture broader contextual weaknesses. In this setup, different agents have specific roles as sequential role: some extract explicit limitations, others analyze methodological gaps, some simulate the viewpoint of a peer reviewer, and a citation agent places the work within the larger body of literature. A Judge agent refines their outputs, and a Master agent consolidates them into a clear set. This structure allows for systematic identification of explicit, implicit, peer review-focused, and literature-informed limitations. Moreover, traditional NLP metrics like BLEU, ROUGE, and cosine similarity rely heavily on n-gram or embedding overlap. They often overlook semantically similar limitations. To address this, we introduce a pointwise evaluation protocol that uses an LLM-as-a-Judge to measure coverage more accurately. Experiments show that our proposed model substantially improve performance. The RAG + multi-agent GPT-4o mini configuration achieves a +15.51% coverage gain over zero-shot baselines, while the Llama 3 8B multi-agent setup yields a +4.41% improvement.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844