Misalignment Bounty: Crowdsourcing AI Agent Misbehavior

arXiv:2510.19738v2 Announce Type: replace
Abstract: Advanced AI systems sometimes act in ways that differ from human intent. To gather clear, reproducible examples, we ran the Misalignment Bounty: a crowdsourced project that collected cases of agents pursuing unintended or unsafe goals. The bounty received 295 submissions, of which nine were awarded.
This report explains the program’s motivation and evaluation criteria, and walks through the nine winning submissions step by step.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844