arXiv:2604.17457v4 Announce Type: replace-cross
Abstract: Q-value iteration (Q-VI) is usually analyzed through the (gamma)-contraction of the Bellman operator. This argument proves convergence to (Q^*), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of (Q)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around (mathcal X_1=Q^*+operatornamespan(mathbf 1)), which is contained in the POSS. For every (varepsilon>0), the distance to (mathcal X_1) satisfies an exponential bound with rate ((barrho+varepsilon)^k), where (barrho) is the joint spectral radius of the projected switching family restricted to directions transverse to (mathcal X_1). When (barrho<gamma), this transverse convergence is faster than the classical contraction rate. The analysis separates fast policy identification from the subsequent convergence to (Q^*), which may still be governed by the all-ones mode. We also give spectral and graph-theoretic conditions under which the strict inequality (barrho<gamma) holds or fails.
Digital health tools and point solutions—pitfalls in population health program measurement
Digital health tools are generally poorly regulated and often lack strong research evidence, posing challenges for purchasers of point solutions such as employer groups and


