• Home
  • Uncategorized
  • Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

arXiv:2604.17457v4 Announce Type: replace-cross
Abstract: Q-value iteration (Q-VI) is usually analyzed through the (gamma)-contraction of the Bellman operator. This argument proves convergence to (Q^*), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of (Q)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around (mathcal X_1=Q^*+operatornamespan(mathbf 1)), which is contained in the POSS. For every (varepsilon>0), the distance to (mathcal X_1) satisfies an exponential bound with rate ((barrho+varepsilon)^k), where (barrho) is the joint spectral radius of the projected switching family restricted to directions transverse to (mathcal X_1). When (barrho<gamma), this transverse convergence is faster than the classical contraction rate. The analysis separates fast policy identification from the subsequent convergence to (Q^*), which may still be governed by the all-ones mode. We also give spectral and graph-theoretic conditions under which the strict inequality (barrho<gamma) holds or fails.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844