arXiv:2507.03674v3 Announce Type: replace-cross
Abstract: Extracting structured information from scientific literature is critical for accelerating discovery, yet Large Language Models (LLMs) often struggle in specialized domains that require expert knowledge and generalize poorly across tasks. We introduce textscStructSense, a modular, task-agnostic, open-source framework that integrates ontology-guided symbolic knowledge, agentic self-evaluative refinement, and human-in-the-loop validation for robust domain-aware extraction. We evaluate textscStructSense on three tasks of increasing semantic complexity: schema-based extraction of assessment instruments (91–100% accuracy), metadata and resource extraction from scientific papers (86–93% overall), and named entity recognition (NER) from neuroscience literature (58–75% label accuracy across 8,882 entities). On two biomedical NER benchmarks (NCBI Disease and S800 Species), the system achieves $geq$90% relaxed recall and 62.5–85.8% strict recall while extracting 1,000–3,600 additional entities beyond gold annotations. The local concept mapping service achieves Hits@1 of 62–82% under strict matching and 68–86% under semantic matching. These results across three domains demonstrate that textscStructSense generalizes across tasks while maintaining source grounding and provenance transparency.
Portable automated rapid testing for auditory assessment: repeated at-home testing in older adults
IntroductionHearing challenges are prevalent in older adults and are associated with age-related cognitive decline. However, measuring age-related changes in hearing faces critical barriers related to