arXiv:2602.16966v2 Announce Type: replace-cross
Abstract: Scalable methods for networked multi-agent reinforcement learning let each agent plan using only a small neighborhood of the agent graph. This works only when the system is value-local, meaning a perturbation at one agent affects the long-run value at another agent weakly when the two are far apart. In the average-reward setting, the standard way to certify locality is the Dobrushin row-sum bound on a single matrix $C^pi$ that captures how each agent’s next state depends on each other agent’s current state. To make this matrix easy to work with, prior work bounds it by a supremum over joint actions. The resulting bound is independent of the policy, but it is loose whenever the policy never picks the worst-case action. We split $C^pi$ into pieces that separately track environment sensitivity and policy sensitivity, $C^pi preceq E^mathrm s+E^mathrm aPi(pi)$, where $E^mathrm s$ measures how the next state moves with the current state, $E^mathrm a$ measures how it moves with the current action, and $Pi(pi)$ measures how reactive the policy is to changes in state. The spectral radius of $H^pi := E^mathrm s+E^mathrm aPi(pi)$ then controls the decay of the average-reward Poisson solution, and the spectral certificate $rho(H^pi)<1$ is strictly weaker than the row-sum condition $|H^pi|_infty<1$ on the same matrix and applies in regimes where policy-independent action-supremum bounds used in prior Dobrushin-style work cannot. For temperature-$tau$ softmax policies we get $Pi(pi)le L/(2tau)$, so the softmax temperature directly controls locality. We use this decay result to give a deterministic oracle guarantee for a block-coordinate KL-proximal policy-improvement template whose truncation bias decays exponentially in the message-passing radius $kappa$.
Inside Interoception: The hidden sense of how you feel inside
MIT Technology Review Explains: Let our writers untangle the complex, messy world of science and technology to help you understand what’s coming next. You can read more



