Q Learning Algorithm
•
For each s,a initialize Q(s,a)
ß
0.
•
Observe current state s.
•
Do forever:
–
Select an action a and execute it
–
Receive immediate reward r
–
Observe new state s'
–
Update the table entry for Q(s,a):
Q(s,a)
ß
r +
g
max
a'
[Q(s',a')]
–
s
ß
s'