CS 104: Introduction to Computer Science

Simple Grid World Example

•

Wish to find optimal policy

p* = p maximizing V^p(s) "s

•

2´3 grid world

–

actions move N,S,E,W

–

goal state G in upper right

corner

–

reward +100 for actions

entering G, 0 otherwise

–

actions cannot exit G

(absorbing state)

–

let discount factor g = 0.9