|
|
|
|
|
|
|
|
|
|
|
|
|
|
• |
Discounted
cumulative reward:
|
|
|
|
– |
Vp(st) = rt + grt+1 + g2rt+2
+ …
|
|
|
|
|
= Sumi=0à¥(girt+i)
|
|
|
• |
Finite
horizon reward:
|
|
|
|
– |
Vp(st) = rt + rt+1 + rt+2 + … + rt+h
|
|
|
|
|
= Sumi=0àh(rt+i)
|
|
|
• |
Average
reward:
|
|
|
|
– |
Vp(st) = rt + rt+1 + rt+2 + … + rt+h
|
|
|
|
|
= limhà¥(Sumi=0àh(rt+i))/h
|
|
|
• |
Assume
discounted cumulative reward chosen as
|
|
goal.
|
|