© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Conditions for Convergence
•Will Q converge to the true value of Q (for the optimal policy)?
•Yes, under certain conditions:
1.deterministic MDP
2.immediate reward values are bounded
•for some positive constant c, |r(s,a)|<c for all s,a
3.choose actions such that it visits every state-action pair infinitely often
–