© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Update Rule for Nondeterministic Case
•Q(s,a) = E[r(s,a)] + g Sums'[P(s'|s,a) maxa'[Q(s',a')]]
•Our previous update rule fails to converge.
–Suppose we start with the correct Q function.
–Nondeterminism will change Q forever.
•Need to slow change to Q over time:
•Let an = 1/(1+visitsn(s,a)) (including current visit)
•Qn(s,a) ß (1-an)(old estimate) + an(new estimate)
•Qn(s,a) ß (1-an)Qn-1(s,a) + an(r + g maxa'[Qn-1(s',a')])
•