© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Conditions for Convergence
•If:
–rewards are bounded (as before)
–the training rule is:
Qn(s,a) ß (1-an)Qn-1(s,a) + an(r + g maxa'[Qn-1(s',a')])
–0 £ g < 1
–Sumi=1à¥(an(i,s,a)) = ¥ where n(i,s,a) is the iteration corresponding to the ith time a is applied to s
–Sumi=1à¥(an(i,s,a))2 < ¥
•Then Q will converge correctly as n à ¥ with probability 1.
–