CS 104: Introduction to Computer Science

Reward Functions to Optimize

•Discounted cumulative reward:

–Vp(st) = rt + grt+1 + g2rt+2 + …
= Sumi=0à¥(girt+i)

•Finite horizon reward:

–Vp(st) = rt + rt+1 + rt+2 + … + rt+h
= Sumi=0àh(rt+i)

•Average reward:

–Vp(st) = rt + rt+1 + rt+2 + … + rt+h
= limhà¥(Sumi=0àh(rt+i))/h

•Assume discounted cumulative reward chosen as goal.