OptoLoop: An optogenetic tool to probe the functional role of genome organization

The genome folds inside the cell nucleus into hierarchical architectural features, such as chromatin loops and domains. If and how this genome organization influences the

Integrating Longitudinal Metabolite Profiles Improves Trait Prediction in Pigs in a Trait- and Timepoint-Dependent Manner

Background Accurate prediction of genetic merit is essential for accelerating genetic improvement in pigs, particularly for traits that are costly or difficult to measure directly.

A De Novo Algorithm for Allele Reconstruction from Oxford Nanopore Amplicon Reads, with Application to CYP2D6

The Oxford Nanopore Technologies’ sequencing platform offers a path towards bedside genomics, producing long reads that can completely cover a gene of interest, and thus

Efficacy of Minnelide in a Next-Generation Dual-Recombinase Regulated Genetically Engineered Mouse Model of CIC::DUX4 Sarcoma

CIC::DUX4 sarcoma (CDS) is a lethal cancer driven by a fusion between tumor suppressor Capicua (CIC) and pioneer transcription factor double homeobox 4 (DUX4). To

AI-assisted Image-Based Phenotyping Reveals Genetic Architecture of Pod Traits in Mungbean (Vigna radiata L.)

Mungbean (Vigna radiata (L.) R. Wilczek) is a vital source of digestible proteins and is well-suited for the plant-based protein industry. In this study, we

RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG

November 7, 2025

arXiv:2511.04502v1 Announce Type: cross
Abstract: Retrieval-Augmented Generation (RAG) is a critical technique for grounding Large Language Models (LLMs) in factual evidence, yet evaluating RAG systems in specialized, safety-critical domains remains a significant challenge. Existing evaluation frameworks often rely on heuristic-based metrics that fail to capture domain-specific nuances and other works utilize LLM-as-a-Judge approaches that lack validated alignment with human judgment. This paper introduces RAGalyst, an automated, human-aligned agentic framework designed for the rigorous evaluation of domain-specific RAG systems. RAGalyst features an agentic pipeline that generates high-quality, synthetic question-answering (QA) datasets from source documents, incorporating an agentic filtering step to ensure data fidelity. The framework refines two key LLM-as-a-Judge metrics-Answer Correctness and Answerability-using prompt optimization to achieve a strong correlation with human annotations. Applying this framework to evaluate various RAG components across three distinct domains (military operations, cybersecurity, and bridge engineering), we find that performance is highly context-dependent. No single embedding model, LLM, or hyperparameter configuration proves universally optimal. Additionally, we provide an analysis on the most common low Answer Correctness reasons in RAG. These findings highlight the necessity of a systematic evaluation framework like RAGalyst, which empowers practitioners to uncover domain-specific trade-offs and make informed design choices for building reliable and effective RAG systems. RAGalyst is available on our Github.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844