© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Simple Grid World Example
•Wish to find optimal policy p* = p maximizing Vp(s) "s
•2´3 grid world
–actions move N,S,E,W
–goal state G in upper right corner
–reward +100 for actions entering G, 0 otherwise
–actions cannot exit G (absorbing state)
–let discount factor g = 0.9
G
100
100
0
0
0
0
0
0
0
0
0
0
immediate reward values
r(s,a)
0