•Discounted cumulative reward:
–Vp(st) = rt + grt+1 + g2rt+2 + …
= Sumi=0à¥(girt+i)
–Vp(st) = rt + rt+1 + rt+2 + … + rt+h
= Sumi=0àh(rt+i)
–Vp(st) = rt + rt+1 + rt+2 + … + rt+h
= limhà¥(Sumi=0àh(rt+i))/h
•Assume discounted cumulative reward chosen as goal.