arXiv:2602.16966v2 Announce Type: replace-cross
Abstract: Scalable methods for networked multi-agent reinforcement learning let each agent plan using only a small neighborhood of the agent graph. This works only when the system is value-local, meaning a perturbation at one agent affects the long-run value at another agent weakly when the two are far apart. In the average-reward setting, the standard way to certify locality is the Dobrushin row-sum bound on a single matrix $C^pi$ that captures how each agent’s next state depends on each other agent’s current state. To make this matrix easy to work with, prior work bounds it by a supremum over joint actions. The resulting bound is independent of the policy, but it is loose whenever the policy never picks the worst-case action. We split $C^pi$ into pieces that separately track environment sensitivity and policy sensitivity, $C^pi preceq E^mathrm s+E^mathrm aPi(pi)$, where $E^mathrm s$ measures how the next state moves with the current state, $E^mathrm a$ measures how it moves with the current action, and $Pi(pi)$ measures how reactive the policy is to changes in state. The spectral radius of $H^pi := E^mathrm s+E^mathrm aPi(pi)$ then controls the decay of the average-reward Poisson solution, and the spectral certificate $rho(H^pi)<1$ is strictly weaker than the row-sum condition $|H^pi|_infty<1$ on the same matrix and applies in regimes where policy-independent action-supremum bounds used in prior Dobrushin-style work cannot. For temperature-$tau$ softmax policies we get $Pi(pi)le L/(2tau)$, so the softmax temperature directly controls locality. We use this decay result to give a deterministic oracle guarantee for a block-coordinate KL-proximal policy-improvement template whose truncation bias decays exponentially in the message-passing radius $kappa$.
Crisis support teams’ technological openness and learning attitudes toward the AI based virtual patient system crisis support VR
BackgroundAgainst the backdrop of escalating global humanitarian crises, innovative didactic simulations are becoming increasingly important. A promising alternative to traditional classroom-based didactics for learning psychological