|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
• |
Wish
to find optimal policy
|
|
|
p* = p maximizing Vp(s) "s
|
• |
2´3 grid world
|
|
|
|
– |
actions
move N,S,E,W
|
|
|
|
– |
goal
state G in upper right
|
|
|
corner
|
|
|
|
– |
reward
+100 for actions
|
|
|
entering
G, 0 otherwise
|
|
|
|
– |
actions
cannot exit G
|
|
|
(absorbing
state)
|
|
|
|
– |
let
discount factor g = 0.9
|
|
|
|