arXiv:2604.14243v1 Announce Type: cross
Abstract: Real-world decision-making systems operate in environments where state transitions depend not only on the agent’s actions, but also on textbfexogenous factors outside its control–competing agents, environmental disturbances, or strategic adversaries–formally, $s_h+1 = f(s_h, a_h, bara_h)+omega_h$ where $bara_h$ is the adversary/external action, $a_h$ is the agent’s action, and $omega_h$ is an additive noise. Ignoring such factors can yield policies that are optimal in isolation but textbffail catastrophically in deployment, particularly when safety constraints must be satisfied.
Standard Constrained MDP formulations assume the agent is the sole driver of state evolution, an assumption that breaks down in safety-critical settings. Existing robust RL approaches address this via distributional robustness over transition kernels, but do not explicitly model the textbfstrategic interaction between agent and exogenous factor, and rely on strong assumptions about divergence from a known nominal model.
We model the exogenous factor as an textbfadversarial policy $barpi$ that co-determines state transitions, and ask how an agent can remain both optimal and safe against such an adversary. emphTo the best of our knowledge, this is the first work to study safety-constrained RL under explicit adversarial dynamics. We propose textbfRobust Hallucinated Constrained Upper-Confidence RL (textttRHC-UCRL), a model-based algorithm that maintains optimism over both agent and adversary policies, explicitly separating epistemic from aleatoric uncertainty. textttRHC-UCRL achieves sub-linear regret and constraint violation guarantees.
Adaptation to free-living drives loss of beneficial endosymbiosis through metabolic trade-offs
Symbioses are widespread (1) and underpin the function of diverse ecosystems (2-6), but their evolutionary stability is challenging to explain (7,8). Fitness trade-offs between con-trasting



