arXiv:2601.20981v2 Announce Type: replace-cross
Abstract: Evolutionary prompt search is a practical black-box approach for red teaming large language models, however existing methods often collapse onto a small family of high-performing prompts, limiting coverage of distinct failure modes. We present a speciated quality-diversity extension of textitToxSearch that maintains multiple high-toxicity prompt niches in parallel rather than optimizing a single best prompt. textitToxSearch-S introduces unsupervised prompt speciation via a search methodology that maintains capacity-limited species with exemplar leaders, a reserve pool for emerging niches, and species-aware parent selection that trades off within-niche exploitation and cross-niche exploration. Preliminary results show textitToxSearch-S reaching higher peak toxicity ($approx 0.73$ vs. $approx 0.47$) with a heavier tail (top-10 median $0.66$ vs. $0.45$) than the baseline. Speciation also yields broader semantic coverage under a topics-as-species analysis (higher effective topic diversity and larger unique topic coverage). Finally, species formed are well-separated in embedding space (mean separation ratio $approx 1.93$) and exhibit distinct toxicity distributions, indicating that speciation partitions the adversarial space into behaviorally differentiated niches rather than superficial lexical variants.
Behavior change beyond intervention: an activity-theoretical perspective on human-centered design of personal health technology
IntroductionModern personal technologies, such as smartphone apps with artificial intelligence (AI) capabilities, have a significant potential for helping people make necessary changes in their behavior


