Simple Grid World Example
Wish to find optimal policy
p* = p maximizing Vp(s) "s
2´3 grid world
actions move N,S,E,W
goal state G in upper right
corner
reward +100 for actions
entering G, 0 otherwise
actions cannot exit G
(absorbing state)
let discount factor g = 0.9
G