• Home
  • Uncategorized
  • STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

arXiv:2507.03674v3 Announce Type: replace-cross
Abstract: Extracting structured information from scientific literature is critical for accelerating discovery, yet Large Language Models (LLMs) often struggle in specialized domains that require expert knowledge and generalize poorly across tasks. We introduce textscStructSense, a modular, task-agnostic, open-source framework that integrates ontology-guided symbolic knowledge, agentic self-evaluative refinement, and human-in-the-loop validation for robust domain-aware extraction. We evaluate textscStructSense on three tasks of increasing semantic complexity: schema-based extraction of assessment instruments (91–100% accuracy), metadata and resource extraction from scientific papers (86–93% overall), and named entity recognition (NER) from neuroscience literature (58–75% label accuracy across 8,882 entities). On two biomedical NER benchmarks (NCBI Disease and S800 Species), the system achieves $geq$90% relaxed recall and 62.5–85.8% strict recall while extracting 1,000–3,600 additional entities beyond gold annotations. The local concept mapping service achieves Hits@1 of 62–82% under strict matching and 68–86% under semantic matching. These results across three domains demonstrate that textscStructSense generalizes across tasks while maintaining source grounding and provenance transparency.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844