Fast Approximation Algorithm for Non-Monotone DR-submodular Maximization under Size Constraint

arXiv:2511.02254v1 Announce Type: cross Abstract: This work studies the non-monotone DR-submodular Maximization over a ground set of $n$ subject to a size constraint $k$. We

AI Credibility Signals Outrank Institutions and Engagement in Shaping News Perception on Social Media

arXiv:2511.02370v1 Announce Type: cross Abstract: AI-generated content is rapidly becoming a salient component of online information ecosystems, yet its influence on public trust and epistemic

Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games

arXiv:2511.02157v1 Announce Type: cross Abstract: No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium for concepts such as Coarse

Estimation of Segmental Longitudinal Strain in Transesophageal Echocardiography by Deep Learning

arXiv:2511.02210v1 Announce Type: cross Abstract: Segmental longitudinal strain (SLS) of the left ventricle (LV) is an important prognostic indicator for evaluating regional LV dysfunction, in

Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behavior

arXiv:2511.02022v1 Announce Type: cross Abstract: Recent work has discovered that large language models can develop broadly misaligned behaviors after being fine-tuned on narrowly harmful datasets,

R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning

October 27, 2025

arXiv:2505.23794v2 Announce Type: replace-cross
Abstract: Retrieval-Augmented Generation (RAG) integrates external knowledge with Large Language Models (LLMs) to enhance factual correctness and mitigate hallucination. However, dense retrievers often become the bottleneck of RAG systems due to their limited parameters compared to LLMs and their inability to perform step-by-step reasoning. While prompt-based iterative RAG attempts to address these limitations, it is constrained by human-designed workflows. To address these limitations, we propose $textbfR3-RAG$, which uses $textbfR$einforcement learning to make the LLM learn how to $textbfR$eason and $textbfR$etrieve step by step, thus retrieving comprehensive external knowledge and leading to correct answers. R3-RAG is divided into two stages. We first use cold start to make the model learn the manner of iteratively interleaving reasoning and retrieval. Then we use reinforcement learning to further harness its ability to better explore the external retrieval environment. Specifically, we propose two rewards for R3-RAG: 1) answer correctness for outcome reward, which judges whether the trajectory leads to a correct answer; 2) relevance-based document verification for process reward, encouraging the model to retrieve documents that are relevant to the user question, through which we can let the model learn how to iteratively reason and retrieve relevant documents to get the correct answer. Experimental results show that R3-RAG significantly outperforms baselines and can transfer well to different retrievers. We release R3-RAG at https://github.com/Yuan-Li-FNLP/R3-RAG.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844