CS 104: Introduction to Computer Science

Reward Functions to Optimize

•

Discounted cumulative reward:

–

V^p(s_t) = r_t + gr_t+1+ g²r_t+2 + …

= Sum_i=0_à_¥(gⁱr_t+i)

•

Finite horizon reward:

–

V^p(s_t) = r_t + r_t+1+ r_t+2 + … + r_t+h

= Sum_i=0_à_h(r_t+i)

•

Average reward:

–

V^p(s_t) = r_t + r_t+1+ r_t+2 + … + r_t+h

= lim_h_à_¥(Sum_i=0_à_h(r_t+i))/h

•

Assume discounted cumulative reward chosen as

goal.