Advancing the adoption of oncology decision support tools in Europe: insights from CAN.HEAL

Effective cancer care increasingly depends on digital decision support tools (DSTs) to interpret complex clinical, molecular, and genomic data and guide personalised treatment decisions. However,

Early Type 2 diabetes risk prediction using explainable machine learning in a two-stage approach

BackgroundDiabetes is a chronic disease characterized by elevated blood glucose levels. Without early detection and proper management, it can lead to serious complications and increase

Practical templates for digital health ethics applications in Sweden: lessons from a sensor-based monitoring study

Obtaining ethical approval for digital health research involving vulnerable populations presents significant challenges for researchers, particularly when navigating complex regulatory frameworks like Sweden’s ethical review

Enhanced meta ensemble stacking approach with XGBoost and optuna based detection of Parkinson’s disease

Parkinson’s disease (PD), a progressive neurological disorder affecting motor function, has been significantly rising in prevalence in recent years. Current diagnostic methods, relying on clinical

Editorial: Advancing digital mental health for youth

Post Content

From memorization to generalization: fine-tuning large language models for biomedical term-to-identifier normalization

April 6, 2026

IntroductionBiomedical data integration requires term-to-identifier normalization, the process of linking natural-language biomedical terms to standardized ontology codes so that extracted concepts become computable and interoperable. Although large language models perform well on clinical text summarization and concept extraction, they remain markedly less accurate at mapping ontology terms to their corresponding identifiers.MethodsWe examined the roles of memorization and generalization in term-to-code mapping across the Human Phenotype Ontology (HPO), the Gene Ontology (GO), and the HGNC gene naming system, including mappings between gene names, lexicalized gene symbols, and arbitrary gene identifiers. Performance was assessed across multiple base models and after task-specific fine-tuning.ResultsAccuracy scaled with model size, with GPT-4o outperforming Llama 3.1 70B and Llama 3.1 8B. Fine-tuning improved forward mappings from term to identifier, with larger gains for GO than for HPO and minimal improvement for gene name-to-HGNC identifier mappings. Generalization to withheld mappings occurred primarily for HGNC gene name-to-gene symbol tasks, whereas fine-tuning on HPO and GO identifiers produced little generalization. Embedding analyses revealed strong semantic alignment between gene names and HGNC gene symbols but no comparable alignment between concept names and identifiers in GO, HPO, or HGNC.ConclusionsThese results suggest that fine-tuning success depends on two interacting factors: popularity and lexicalization. Popularity, a proxy for pretraining exposure to term-identifier pairs, predicted baseline accuracy and the magnitude of memorization gains during fine-tuning, whereas long-tail identifiers remained difficult to consolidate. Lexicalization, the extent to which a symbol functions as a meaningful token in embedding space, enabled generalization and explains why generalization emerged for HGNC gene symbols but not for the arbitrary identifiers used in GO and HPO. Together, these findings provide a predictive framework for identifying when fine-tuning can improve factual term normalization, when gains primarily reflect memorization, and when normalization is likely to fail.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844