•What if r and d are nondeterministic (e.g. roll of a die in a game)?
•Vp(st) = ExpectedValue[rt + grt+1 + g2rt+2 + …]
=
E[Sumi=0à¥(girt+i)]
•Q(s,a) = E[r(s,a) + g V*(d(s,a))]
=
E[r(s,a)] + g E[V*(d(s,a))]
= E[r(s,a)] + g Sums'[P(s'|s,a)V*(s')]
•Therefore:
Q(s,a)
= E[r(s,a)] + g Sums'[P(s'|s,a) maxa'[Q(s',a')]]
•Note: This is not an update rule.