|
|
|
|
|
|
|
|
|
|
|
|
• |
What
if r and d are nondeterministic (e.g. roll of a
|
|
die
in a game)?
|
|
|
• |
Vp(st) = ExpectedValue[rt + grt+1 + g2rt+2
+ …]
|
|
|
|
|
= E[Sumi=0à¥(girt+i)]
|
|
• |
Q(s,a)
= E[r(s,a) + g V*(d(s,a))]
|
|
|
= E[r(s,a)] + g E[V*(d(s,a))]
|
|
|
= E[r(s,a)] + g Sums'[P(s'|s,a)V*(s')]
|
|
|
• |
Therefore:
|
|
|
Q(s,a)
= E[r(s,a)] + g Sums'[P(s'|s,a) maxa'[Q(s',a')]]
|
|
|
• |
Note:
This is not an update rule.
|
|
|
|