• Home
  • DTx
  • From memorization to generalization: fine-tuning large language models for biomedical term-to-identifier normalization

IntroductionBiomedical data integration requires term-to-identifier normalization, the process of linking natural-language biomedical terms to standardized ontology codes so that extracted concepts become computable and interoperable. Although large language models perform well on clinical text summarization and concept extraction, they remain markedly less accurate at mapping ontology terms to their corresponding identifiers.MethodsWe examined the roles of memorization and generalization in term-to-code mapping across the Human Phenotype Ontology (HPO), the Gene Ontology (GO), and the HGNC gene naming system, including mappings between gene names, lexicalized gene symbols, and arbitrary gene identifiers. Performance was assessed across multiple base models and after task-specific fine-tuning.ResultsAccuracy scaled with model size, with GPT-4o outperforming Llama 3.1 70B and Llama 3.1 8B. Fine-tuning improved forward mappings from term to identifier, with larger gains for GO than for HPO and minimal improvement for gene name-to-HGNC identifier mappings. Generalization to withheld mappings occurred primarily for HGNC gene name-to-gene symbol tasks, whereas fine-tuning on HPO and GO identifiers produced little generalization. Embedding analyses revealed strong semantic alignment between gene names and HGNC gene symbols but no comparable alignment between concept names and identifiers in GO, HPO, or HGNC.ConclusionsThese results suggest that fine-tuning success depends on two interacting factors: popularity and lexicalization. Popularity, a proxy for pretraining exposure to term-identifier pairs, predicted baseline accuracy and the magnitude of memorization gains during fine-tuning, whereas long-tail identifiers remained difficult to consolidate. Lexicalization, the extent to which a symbol functions as a meaningful token in embedding space, enabled generalization and explains why generalization emerged for HGNC gene symbols but not for the arbitrary identifiers used in GO and HPO. Together, these findings provide a predictive framework for identifying when fine-tuning can improve factual term normalization, when gains primarily reflect memorization, and when normalization is likely to fail.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844