© 2000 Todd Neller.
A.I.M.A. t
ext
figures
© 1995 Prentice Hall.
Used by
permission.
Q Learning Algorithm
•
For each s,a initialize Q(s,a)
ß
0.
•
Observe current state s.
•
Do forever:
–
Select an action a and execute it
–
Receive immediate reward r
–
Observe new state s'
–
Update the table entry for Q(s,a):
Q(s,a)
ß
r +
g
max
a'
[Q
(s',a')]
–
s
ß
s'