Reward Functions to Optimize
Discounted cumulative reward:
Vp(st) =  rt + grt+1 + g2rt+2 + …
     = Sumi=0à¥(girt+i)
Finite horizon reward:
Vp(st) =  rt + rt+1 + rt+2 + … + rt+h
     = Sumi=0àh(rt+i)
Average reward:
Vp(st) =  rt + rt+1 + rt+2 + … + rt+h
     = limhà¥(Sumi=0àh(rt+i))/h
Assume discounted cumulative reward chosen as
goal.