CS 104: Introduction to Computer Science

Simple Grid World Example

•Wish to find optimal policy p* = p maximizing Vp(s) "s

•2´3 grid world

–actions move N,S,E,W

–goal state G in upper right corner

–reward +100 for actions entering G, 0 otherwise

–actions cannot exit G (absorbing state)

–let discount factor g = 0.9

100

immediate reward values
r(s,a)