What’s in a name? Moderna’s “vaccine” vs. “therapy” dilemma

Is it the Department of Defense or the Department of War? The Gulf of Mexico or the Gulf of America? A vaccine—or an “individualized neoantigen

TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization

arXiv:2604.07553v1 Announce Type: cross Abstract: This study presents a framework for generating the gold-standard summary fully automatically and reproducibly based on multiple human summaries of

Google, AI Literacy, and the Learning Sciences: Multiple Modes of Research, Industry, and Practice Partnerships

arXiv:2604.07601v1 Announce Type: cross Abstract: Enabling AI literacy in the general population at scale is a complex challenge requiring multiple stakeholders and institutions collaborating together.

GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control

arXiv:2604.07426v1 Announce Type: cross Abstract: Model-based reinforcement learning (MBRL) improves sample efficiency by optimizing policies inside imagined rollouts, but long-horizon planning degrades when model errors

Cluster Attention for Graph Machine Learning

arXiv:2604.07492v1 Announce Type: cross Abstract: Message Passing Neural Networks have recently become the most popular approach to graph machine learning tasks; however, their receptive field

On the generalization of language models from in-context learning and finetuning: a controlled study

November 12, 2025

arXiv:2505.00661v3 Announce Type: replace-cross
Abstract: Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning. E.g. they can fail to generalize to simple reversals of relations they are trained on, or fail to make simple logical deductions based on trained information. These failures to generalize factual information from fine-tuning can significantly hinder the reasoning capabilities of these models. On the other hand, language models’ in-context learning (ICL) shows different inductive biases and deductive reasoning capabilities. Here, we explore these differences in generalization and deductive reasoning between in-context- and fine-tuning-based learning. To do so, we constructed several novel datasets to evaluate and improve models’ abilities to make generalizations over factual information from novel data. These datasets are designed to create clean tests of generalization, by isolating the knowledge in the dataset from that in pretraining. We expose pretrained large models to controlled subsets of the information in these datasets — either through ICL or fine-tuning — and evaluate their performance on test sets that require various types of generalization. We find overall that in data-matched settings, ICL can generalize several types of inferences more flexibly than fine-tuning (though we also find some qualifications of prior findings, such as cases when fine-tuning can generalize to reversals embedded in a larger structure of knowledge). We build on these findings to propose a method to enable improved generalization from fine-tuning: adding in-context reasoning traces to finetuning data. We show that this method improves generalization across various splits of our datasets and other benchmarks. Our results have implications for understanding the generalization afforded by different modes of learning in language models, and practically improving their performance.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844