arXiv:2605.19768v1 Announce Type: new
Abstract: We study reinforcement learning for episodic Markov Decision Processes (MDPs) whose transitions are modelled by a multinomial logistic (MNL) model. Existing algorithms for MNL mixture MDPs yield a regret of $smashtildeO(dH^2sqrtT)$ (Li et al., 2024), where $d$ is the feature dimension, $H$ the episode length, and $T$ the number of episodes. Inspired by the logistic bandit literature (Abeille et al., 2021; Faury et al., 2022; Boudart et al., 2026), we introduce a problem-dependent constant $barsigma_T leq 1/2$, measuring the normalised average variance of the optimal downstream value function along the learner’s trajectory. We propose an algorithm achieving a regret of $smashtildeO(dH^2barsigma_TsqrtT)$, which recovers the existing bound in the worst case and improves upon it for structured MDPs. For instance, for KL-constrained robust MDPs, $barsigma_T = O(H^-1)$, reducing the horizon dependence by a factor $H$. We further establish a matching $smashOmega(dH^2barsigma_TsqrtT)$ lower bound, proving minimax optimality (up to logarithmic factors) and fully characterising the regret complexity of MNL mixture MDPs for the first time.
Explainable AI in kidney stone detection and segmentation: a mini review
Kidney stones are one of the most common renal disorders that can produce severe complications if not diagnosed and treated early. Recently, advances in AI