CS 104: Introduction to Computer Science

Conditions for Convergence

•Will Q converge to the true value of Q (for the optimal policy)?

•Yes, under certain conditions:

1.deterministic MDP

2.immediate reward values are bounded

•for some positive constant c, |r(s,a)|<c for all s,a

3.choose actions such that it visits every state-action pair infinitely often

–