arXiv:2603.06813v1 Announce Type: new
Abstract: Reusable decision structure survives across episodes in reinforcement learning, but this depends on how the agent–world boundary is drawn. In stationary, finite-horizon MDPs, an invariant core: the (not-necessarily contiguous) subsequences of state–action pairs shared by all successful trajectories (optionally under a simple abstraction) can be constructed. Under mild goal-conditioned assumptions, it’s existence can be proven and explained by how the core captures prototypes that transfer across episodes. When the same task is embedded in a decentralized Markov game and the peer agent is folded into the world, each peer-policy update induces a new MDP; the per-episode invariant core can shrink or vanish, even with small changes to the induced world dynamics, sometimes leaving only the individual task core or just nothing. This policy-induced non-stationarity can be quantified with a variation budget over the induced kernels and rewards, linking boundary drift to loss of invariants. The view that a continual RL problem arises from instability of the agent–world boundary (rather than exogenous task switches) in decentralized MARL suggests future work on preserving, predicting, or otherwise managing boundary drift.
Intellectual Stewardship: Re-adapting Human Minds for Creative Knowledge Work in the Age of AI
arXiv:2603.18117v1 Announce Type: cross Abstract: Background: Amid the opportunities and risks introduced by generative AI, learning research needs to envision how human minds and responsibilities


