Unburdening healthcare systems through telenursing in chronic respiratory disease management: a systematic review

Background/objectivesChronic respiratory diseases represent a major cause of morbidity/mortality and healthcare expenditure due to disease exacerbations, emergency department (ED) presentations, hospitalizations, and length of stay

Using GPT-4 to annotate the severity of all phenotypic abnormalities within the human phenotype ontology

IntroductionThe Human Phenotype Ontology (HPO) provides a unified framework cataloguing over 17,500 phenotypic abnormalities across more than 8,600 rare diseases, defining hierarchical relationships between them.

Understanding the value of virtual care technologies: development of a framework in the veterans health administration

IntroductionHealthcare systems, including the Veterans Health Administration (VHA), are facing tremendous growth in virtual care technologies that are intended to foster connections between patients, informal

Human-supervised, large language model-based clinical decision support aligned to national newborn protocols in Kenya: a pragmatic, early-stage evaluation

IntroductionTimely, protocol-adherent clinical decisions are crucial for reducing neonatal mortality in low-resource settings. Translating extensive national guidelines into bedside practice remains challenging.ObjectiveWe developed and evaluated

A pilot study of human–AI conversational interaction and its impact on loneliness and wellbeing

IntroductionWith the growing accessibility of advanced artificial intelligence (AI) chatbots, there is a need to understand their impact on users’ psychological wellbeing. This pilot study

TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning

May 28, 2026

arXiv:2605.28699v1 Announce Type: new
Abstract: Large language models increasingly rely on either reinforcement learning or multi-agent prompting to improve reasoning, yet these two paradigms remain difficult to combine. Directly applying single-agent reinforcement learning to multi-turn multi-agent systems faces following dilemmas: i) Sparse rewards, role-level free-riding and excessive training overhead. ii) Agents only imitate to collaborate. iii) Fixed collaboration protocol falls into oscillating local optimum. We introduce TRACER, a turn-level reinforcement framework for cooperative multi-LLM reasoning. TRACER separates collaborative decision making into a controller-regret layer, where controllers learn whether the agents should speak or skip the current round through regret matching, and a generation-credit layer, which optimizes proposer and reviewer utterances with role-specific GSPO rewards. This design i) assigns credit at the level of both action modes and generated utterances, thus avoiding free-riding and sparse rewards. We only expand the choices made by the controllers, thus greatly reducing computational cost of training. Moreover, ii) agents acquire collaborative capability as they learn when to utter and what to speak. Finally, iii) by designing binary actions ingeniously, we extend classical game theory established for finite action spaces to deep learning, thus achieving mathematically rigorous convergence. We train all local RL-style methods on the GSM8K training split and evaluate on held-out GSM8K, MATH500, and GPQA-Diamond to measure in-domain accuracy, cross-benchmark generalization, inference cost, and correction-preservation behavior. The resulting framework provides a compact and reproducible testbed for studying learned collaboration policies beyond fixed debate, voting, or aggregation protocols. Code is available at https://github.com/Shark-Forest/TRACER.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844