IntroductionThe recognition and linking of discontinuous named entities (DiscNEs) in healthcare remain challenging due to their fragmented structure and semantic complexity. This study presents a comparative analysis of two state-of-the-art DiscNER models: TriG-NER, a grid-tagging architecture, and DocDiscNER, a generative document-level model. The aim is to provide a broader understanding of their generalisation capabilities, performance across diverse entity categories, and effectiveness when integrated with a Named Entity Normalisation (NEN) component.MethodsExperiments were conducted on two healthcare corpora with distinct characteristics: BioCreative-HPO, which contains sentence-level annotations with two entity types, and Occup-Sub, which provides document-level annotations across six entity categories. We compared TriG-NER and DocDiscNER across both datasets and analysed their performance across several NER attributes, including sentence length, entity density, and out-of-vocabulary density. We also assessed computational cost, evaluated the integration of each model with an NEN component, examined GPT-4.1 in a few-shot setting for this task, and conducted a qualitative error analysis.ResultsTriG-NER achieved the best performance on BioCreative-HPO, with an F1 score of 78.2%, while DocDiscNER performed best on Occup-Sub, with an F1 score of 82.2%. These results demonstrate the effectiveness of TriG-NER in sentence-level contexts and the advantage of DocDiscNER in longer, document-level contexts involving multiple entity categories. TriG-NER also showed superior computational efficiency, requiring less training time and GPU memory. In contrast, DocDiscNER benefited from the Coordination Ellipses Resolution (CER) component, which improved its handling of complex discontinuous structures. Despite its potential, GPT-4.1 underperformed in the few-shot setting.DiscussionThe findings highlight complementary strengths between grid-tagging and generative approaches for DiscNER in healthcare. TriG-NER is more computationally efficient and performs strongly in sentence-level settings, whereas DocDiscNER is better suited to longer and more complex document-level contexts. The limited performance of GPT-4.1 suggests that full fine-tuning or task-specific adaptation may be necessary to achieve optimal performance in DiscNER.
Performance of large language models in delivering accurate and comprehensible patient information on heart failure and cardiomyopathy
BackgroundLarge language models (LLMs) are increasingly used by patients seeking cardiovascular health information through digital platforms. However, their accuracy and suitability for providing guidance on


