CS 104: Introduction to Computer Science

Conditions for Convergence

•If:

–rewards are bounded (as before)

–the training rule is:
Qn(s,a) ß (1-an)Qn-1(s,a) + an(r + g maxa'[Qn-1(s',a')])

–0 £ g < 1

–Sumi=1à¥(an(i,s,a)) = ¥ where n(i,s,a) is the iteration corresponding to the ith time a is applied to s

–Sumi=1à¥(an(i,s,a))2 < ¥

•Then Q will converge correctly as n à ¥ with probability 1.

–