• Home
  • Uncategorized
  • Compositional Steering of Large Language Models with Steering Tokens

arXiv:2601.05062v2 Announce Type: replace-cross
Abstract: Deploying LLMs in real-world applications requires controllable output that satisfies multiple desiderata at the same time. While existing work extensively addresses LLM steering for a single behavior, textitcompositional steering — i.e., steering LLMs simultaneously towards multiple behaviors — remains an underexplored problem. In this work, we propose emphcompositional steering tokens for multi-behavior steering. We first embed individual behaviors, expressed as natural language instructions, into dedicated tokens via self-distillation. Contrary to most prior work, which operates in the activation space, our behavior steers live in the space of input tokens, enabling more effective zero-shot composition. We then train a dedicated textitcomposition token on pairs of behaviors and show that it successfully captures the notion of composition: it generalizes well to textitunseen compositions, including those with unseen behaviors as well as those with an unseen textitnumber of behaviors. Our experiments across different LLM architectures show that steering tokens lead to superior multi-behavior steering of verifiable constraints (e.g., length, format, structure, language) compared to competing approaches (instructions, activation steering, and LoRA merging). Moreover, we show that steering tokens complement natural language instructions, with their combination resulting in further gains.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844