arXiv:2601.05062v2 Announce Type: replace-cross
Abstract: Deploying LLMs in real-world applications requires controllable output that satisfies multiple desiderata at the same time. While existing work extensively addresses LLM steering for a single behavior, textitcompositional steering — i.e., steering LLMs simultaneously towards multiple behaviors — remains an underexplored problem. In this work, we propose emphcompositional steering tokens for multi-behavior steering. We first embed individual behaviors, expressed as natural language instructions, into dedicated tokens via self-distillation. Contrary to most prior work, which operates in the activation space, our behavior steers live in the space of input tokens, enabling more effective zero-shot composition. We then train a dedicated textitcomposition token on pairs of behaviors and show that it successfully captures the notion of composition: it generalizes well to textitunseen compositions, including those with unseen behaviors as well as those with an unseen textitnumber of behaviors. Our experiments across different LLM architectures show that steering tokens lead to superior multi-behavior steering of verifiable constraints (e.g., length, format, structure, language) compared to competing approaches (instructions, activation steering, and LoRA merging). Moreover, we show that steering tokens complement natural language instructions, with their combination resulting in further gains.
Behavior change beyond intervention: an activity-theoretical perspective on human-centered design of personal health technology
IntroductionModern personal technologies, such as smartphone apps with artificial intelligence (AI) capabilities, have a significant potential for helping people make necessary changes in their behavior

