arXiv:2605.19768v1 Announce Type: new
Abstract: We study reinforcement learning for episodic Markov Decision Processes (MDPs) whose transitions are modelled by a multinomial logistic (MNL) model. Existing algorithms for MNL mixture MDPs yield a regret of $smashtildeO(dH^2sqrtT)$ (Li et al., 2024), where $d$ is the feature dimension, $H$ the episode length, and $T$ the number of episodes. Inspired by the logistic bandit literature (Abeille et al., 2021; Faury et al., 2022; Boudart et al., 2026), we introduce a problem-dependent constant $barsigma_T leq 1/2$, measuring the normalised average variance of the optimal downstream value function along the learner’s trajectory. We propose an algorithm achieving a regret of $smashtildeO(dH^2barsigma_TsqrtT)$, which recovers the existing bound in the worst case and improves upon it for structured MDPs. For instance, for KL-constrained robust MDPs, $barsigma_T = O(H^-1)$, reducing the horizon dependence by a factor $H$. We further establish a matching $smashOmega(dH^2barsigma_TsqrtT)$ lower bound, proving minimax optimality (up to logarithmic factors) and fully characterising the regret complexity of MNL mixture MDPs for the first time.
Feasibility testing of a home-based exercise intervention in children with cerebral palsy who are ambulant—a study protocol of the HOME-EX study
Children gain increased health and well-being by participating in physical activity. Children with cerebral palsy who are ambulatory (CP-A) are known to be less physically