© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
How to Learn Q
•Q(s,a) = r(s,a) + g maxa'[Q(d(s,a),a')]
•We don't know r and d, but this forms the basis for an iterative update based on an observed action transition and reward
•Having taken action a in state s, and finding oneself in state s' with immediate reward r:
Q(s,a) ß r + g maxa'[Q(s',a')]