© 2000 Todd Neller.
A.I.M.A. t
ext
figures
© 1995 Prentice Hall.
Used by
permission.
Simple Grid World Example
•
Wish to find optimal policy
p
* =
p
maximizing
V
p
(s)
"
s
•
2
´
3 grid world
–
actions move N,S,E,W
–
goal state G in upper right
corner
–
reward +100 for actions
entering G, 0 otherwise
–
actions cannot exit G
(absorbing state)
–
let discount factor
g
= 0.9
G
100
100
0
0
0
0
0
0
0
0
0
0
immediate reward values
r(s,a)
0