© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Reward Functions to Optimize
•Discounted cumulative reward:
–Vp(st) =  rt + grt+1 + g2rt+2 + …
= Sumi=0à¥(girt+i)
•Finite horizon reward:
–Vp(st) =  rt + rt+1 + rt+2 + … + rt+h
= Sumi=0àh(rt+i)
•Average reward:
–Vp(st) =  rt + rt+1 + rt+2 + … + rt+h
= limhà¥(Sumi=0àh(rt+i))/h
•Assume discounted cumulative reward chosen as goal.