© 2000 Todd Neller.
A.I.M.A. t
ext
figures
© 1995 Prentice Hall.
Used by
permission.
How to Learn Q
•
How to estimate training values for Q given
sequence of immediate rewards spread out
over time?
•
Iterative approximation method
•
Note: V*(s) = max
a'
[Q(s,a')]
•
Given: Q(s,a) = r(s,a) +
g
V*(
d
(s,a))
•
Therefore:
Q(s,a) = r(s,a) +
g
max
a'
[Q(
d
(s,a),a')]