© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Nondeterministic Rewards and Actions
•What if r and d are nondeterministic (e.g. roll of a die in a game)?
•Vp(st) =  ExpectedValue[rt + grt+1 + g2rt+2 + …]
= E[Sumi=0à¥(girt+i)]
•Q(s,a) = E[r(s,a) + g V*(d(s,a))]
= E[r(s,a)] + g E[V*(d(s,a))]
= E[r(s,a)] + g Sums'[P(s'|s,a)V*(s')]
•Therefore:
Q(s,a) = E[r(s,a)] + g Sums'[P(s'|s,a) maxa'[Q(s',a')]]
•Note: This is not an update rule.