Grid World Q Learning
Example
For each s,a initialize Q(s,a) ß 0.
Observe current state s.
Do forever:
Select an action a and execute it
Receive immediate reward r
Observe new state s'
Update the table entry for Q(s,a):
Q(s,a) ß r + g maxa'[Q(s',a')]
s ß s'
1
2
3(G)
4
5
6