• Home
  • AI/ML & Advanced Analytics
  • Atomic Skills are the Prerequisite: When Reinforcement Learning Synthesizes Compositional Reasoning, and When It Only Amplifies

arXiv:2512.01970v3 Announce Type: replace
Abstract: Does Reinforcement Learning (RL) merely amplify existing skills, or synthesize novel skills? We investigate this question through the lens of Complementary Reasoning: the critical practical capability of integrating internal knowledge with external context, a prerequisite for reliable Continual Learning and Retrieval-Augmented Generation. To avoid pre-training contamination, we construct a controlled semanticsynthetic dataset of biographies and decompose this capability into two atomic skills: Parametric Reasoning (retrieving facts encoded in model weights) and Contextual Reasoning (processing novel in-context information). We present two findings. First, models supervised directly on the composite task reach high accuracy on seen facts and reasoning paths (90%) but collapse on novel facts and reasoning paths (18%), indicating that Supervised Fine-Tuning (SFT) relies on rote memorization rather than genuine skill integration. Second, RL bridges this generalization gap, acting as a skill synthesizer rather than a mere amplifier–but only under a strict prerequisite: it synthesizes new composite strategies only when the base model has first mastered the independent atomic skills via SFT. These results suggest that decoupled atomic training followed by RL offers a scalable path to complex novel reasoning.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844