• Home
  • Uncategorized
  • Less is More for RAG: Information Gain Pruning for Generator-Aligned Reranking and Evidence Selection

arXiv:2601.17532v1 Announce Type: cross
Abstract: Retrieval-augmented generation (RAG) grounds large language models with external evidence, but under a limited context budget, the key challenge is deciding which retrieved passages should be injected. We show that retrieval relevance metrics (e.g., NDCG) correlate weakly with end-to-end QA quality and can even become negatively correlated under multi-passage injection, where redundancy and mild conflicts destabilize generation. We propose textbfInformation Gain Pruning (IGP), a deployment-friendly reranking-and-pruning module that selects evidence using a generator-aligned utility signal and filters weak or harmful passages before truncation, without changing existing budget interfaces. Across five open-domain QA benchmarks and multiple retrievers and generators, IGP consistently improves the quality–cost trade-off. In a representative multi-evidence setting, IGP delivers about +12–20% relative improvement in average F1 while reducing final-stage input tokens by roughly 76–79% compared to retriever-only baselines.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844