• Home
  • Uncategorized
  • Soft-Label Governance for Distributional Safety in Multi-Agent Systems

arXiv:2604.19752v1 Announce Type: cross
Abstract: Multi-agent AI systems exhibit emergent risks that no single agent produces in isolation. Existing safety frameworks rely on binary classifications of agent behavior, discarding the uncertainty inherent in proxy-based evaluation. We introduce SWARM (textbfSystem-textbfWide textbfAssessment of textbfRisk in textbfMulti-agent systems), a simulation framework that replaces binary good/bad labels with emphsoft probabilistic labels $p = P(v=+1) in [0,1]$, enabling continuous-valued payoff computation, toxicity measurement, and governance intervention. SWARM implements a modular governance engine with configurable levers (transaction taxes, circuit breakers, reputation decay, and random audits) and quantifies their effects through probabilistic metrics including expected toxicity $mathbbE[1-p mid textaccepted]$ and quality gap $mathbbE[p mid textaccepted] – mathbbE[p mid textrejected]$. Across seven scenarios with five-seed replication, strict governance reduces welfare by over 40% without improving safety. In parallel, aggressively internalizing system externalities collapses total welfare from a baseline of $+262$ down to $-67$, while toxicity remains invariant. Circuit breakers require careful calibration; overly restrictive thresholds severely diminish system value, whereas an optimal threshold balances moderate welfare with minimized toxicity. Companion experiments show soft metrics detect proxy gaming by self-optimizing agents passing conventional binary evaluations. This basic governance layer applies to live LLM-backed agents (Concordia entities, Claude, GPT-4o Mini) without modification. Results show distributional safety requires emphcontinuous risk metrics and governance lever calibration involves quantifiable safety-welfare tradeoffs. Source code and project resources are publicly available at https://www.swarm-ai.org/.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844