CS 104: Introduction to Computer Science


Nondeterministic Rewards and

	Actions


•	What if r and d are nondeterministic (e.g. roll of a
	die in a game)?

•	V^p(s_t) = ExpectedValue[r_t + gr_t+1+ g²r_t+2 + …]

	= E[Sum_i=0_à_¥(gⁱr_t+i)]
•	Q(s,a) = E[r(s,a) + g V*(d(s,a))]
	= E[r(s,a)] + g E[V*(d(s,a))]
	= E[r(s,a)] + g Sum_s'[P(s'\|s,a)V*(s')]

•	Therefore:
	Q(s,a) = E[r(s,a)] + g Sum_s'[P(s'\|s,a) max_a'[Q(s',a')]]

•	Note: This is not an update rule.