arXiv:2604.14243v2 Announce Type: replace-cross
Abstract: Real-world decision-making systems operate in environments where state transitions depend not only on the agent’s actions, but also on textbfexogenous factors outside its control–competing agents, environmental disturbances, or strategic adversaries–formally, $s_h+1 = f(s_h, a_h, bara_h)+omega_h$ where $bara_h$ is the adversary/external action, $a_h$ is the agent’s action, and $omega_h$ is an additive noise. Ignoring such factors can yield policies that are optimal in isolation but textbffail catastrophically in deployment, particularly when safety constraints must be satisfied.
Standard Constrained MDP formulations assume the agent is the sole driver of state evolution, an assumption that breaks down in safety-critical settings. Existing robust RL approaches address this via distributional robustness over transition kernels, but do not explicitly model the textbfstrategic interaction between agent and exogenous factor, and rely on strong assumptions about divergence from a known nominal model.
We model the exogenous factor as an textbfadversarial policy $barpi$ that co-determines state transitions, and ask how an agent can remain both optimal and safe against such an adversary. emphTo the best of our knowledge, this is the first work to study safety-constrained RL under explicit adversarial dynamics. We propose textbfRobust Hallucinated Constrained Upper-Confidence RL (textttRHC-UCRL), a model-based algorithm that maintains optimism over both agent and adversary policies, explicitly separating epistemic from aleatoric uncertainty. textttRHC-UCRL achieves sub-linear regret and constraint violation guarantees.
Behavior change beyond intervention: an activity-theoretical perspective on human-centered design of personal health technology
IntroductionModern personal technologies, such as smartphone apps with artificial intelligence (AI) capabilities, have a significant potential for helping people make necessary changes in their behavior


