CS 104: Introduction to Computer Science

Conditions for Convergence

•

Will Q converge to the true value of Q

(for the optimal policy)?

•

Yes, under certain conditions:

deterministic MDP

immediate reward values are bounded

•

for some positive constant c, |r(s,a)|<c for all s,a

choose actions such that it visits every state-

action pair infinitely often