• Home
  • DTx
  • Recognition and linking of discontinuous named entities in healthcare: a comparative performance analysis

IntroductionThe recognition and linking of discontinuous named entities (DiscNEs) in healthcare remain challenging due to their fragmented structure and semantic complexity. This study presents a comparative analysis of two state-of-the-art DiscNER models: TriG-NER, a grid-tagging architecture, and DocDiscNER, a generative document-level model. The aim is to provide a broader understanding of their generalisation capabilities, performance across diverse entity categories, and effectiveness when integrated with a Named Entity Normalisation (NEN) component.MethodsExperiments were conducted on two healthcare corpora with distinct characteristics: BioCreative-HPO, which contains sentence-level annotations with two entity types, and Occup-Sub, which provides document-level annotations across six entity categories. We compared TriG-NER and DocDiscNER across both datasets and analysed their performance across several NER attributes, including sentence length, entity density, and out-of-vocabulary density. We also assessed computational cost, evaluated the integration of each model with an NEN component, examined GPT-4.1 in a few-shot setting for this task, and conducted a qualitative error analysis.ResultsTriG-NER achieved the best performance on BioCreative-HPO, with an F1 score of 78.2%, while DocDiscNER performed best on Occup-Sub, with an F1 score of 82.2%. These results demonstrate the effectiveness of TriG-NER in sentence-level contexts and the advantage of DocDiscNER in longer, document-level contexts involving multiple entity categories. TriG-NER also showed superior computational efficiency, requiring less training time and GPU memory. In contrast, DocDiscNER benefited from the Coordination Ellipses Resolution (CER) component, which improved its handling of complex discontinuous structures. Despite its potential, GPT-4.1 underperformed in the few-shot setting.DiscussionThe findings highlight complementary strengths between grid-tagging and generative approaches for DiscNER in healthcare. TriG-NER is more computationally efficient and performs strongly in sentence-level settings, whereas DocDiscNER is better suited to longer and more complex document-level contexts. The limited performance of GPT-4.1 suggests that full fine-tuning or task-specific adaptation may be necessary to achieve optimal performance in DiscNER.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844