Adaptation to free-living drives loss of beneficial endosymbiosis through metabolic trade-offs

Symbioses are widespread (1) and underpin the function of diverse ecosystems (2-6), but their evolutionary stability is challenging to explain (7,8). Fitness trade-offs between con-trasting

Gradient-specified optimization based on muscle surface mesh and moment arm as an effect-oriented approach of automated musculotendon path modeling

There is more to musculotendon path modeling than aligning a cable to reflect the geometric features of a muscle-tendon unit. From the perspective of simulation

TREM2 deficiency causes region-specific brain effects in a mouse model of cerebral amyloid angiopathy

Cerebral amyloid angiopathy (CAA), a major vascular contributor to cognitive decline, is present in 85-95% of Alzheimer disease (AD) patients. Despite its high prevalence, the

Frontal Brain Injury Reduces Sensitivity to Reward-Predictive Cues and Remodels the Nucleus Accumbens

Traumatic brain injuries (TBIs) are more than mere lesions and generate a persistent secondary pathology. This, combined with functional reorganization of circuits post-injury, may explain

Highly replicable multisite patterns of adolescent white matter maturation

The Adolescent Brain Cognitive Development (ABCD) Study is the largest U.S.-based neuroimaging initiative of adolescent brain maturation. Diffusion MRI (dMRI) provides unique insights into white

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees

April 17, 2026

arXiv:2604.14243v1 Announce Type: cross
Abstract: Real-world decision-making systems operate in environments where state transitions depend not only on the agent’s actions, but also on textbfexogenous factors outside its control–competing agents, environmental disturbances, or strategic adversaries–formally, $s_h+1 = f(s_h, a_h, bara_h)+omega_h$ where $bara_h$ is the adversary/external action, $a_h$ is the agent’s action, and $omega_h$ is an additive noise. Ignoring such factors can yield policies that are optimal in isolation but textbffail catastrophically in deployment, particularly when safety constraints must be satisfied.
Standard Constrained MDP formulations assume the agent is the sole driver of state evolution, an assumption that breaks down in safety-critical settings. Existing robust RL approaches address this via distributional robustness over transition kernels, but do not explicitly model the textbfstrategic interaction between agent and exogenous factor, and rely on strong assumptions about divergence from a known nominal model.
We model the exogenous factor as an textbfadversarial policy $barpi$ that co-determines state transitions, and ask how an agent can remain both optimal and safe against such an adversary. emphTo the best of our knowledge, this is the first work to study safety-constrained RL under explicit adversarial dynamics. We propose textbfRobust Hallucinated Constrained Upper-Confidence RL (textttRHC-UCRL), a model-based algorithm that maintains optimism over both agent and adversary policies, explicitly separating epistemic from aleatoric uncertainty. textttRHC-UCRL achieves sub-linear regret and constraint violation guarantees.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844