© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Grid World Q Learning Example
•For each s,a initialize Q(s,a) ß 0.
•Observe current state s.
•Do forever:
–Select an action a and execute it
–Receive immediate reward r
–Observe new state s'
–Update the table entry for Q(s,a):
Q(s,a) ß r + g maxa'[Q(s',a')]
–s ß s'
6
5
4
3(G)
2
1
100
100
0
0
0
0
0
0
0
0
0
0
immediate reward values
r(s,a)
0