 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Training
using updating sequence in reverse
|
|
|
order
speeds convergence.
|
|
|
| • |
Tradeoff:
Requires more memory
|
|
|
| • |
Suppose
exploration and learning cost great
|
|
|
time/expense.
|
|
|
| • |
Can
retrain on same data repeatedly.
|
|
|
| • |
Ratio
of old/new update sequences a matter of
|
|
|
relative
costs for problem domain.
|
|
|
| • |
Tradeoff:
Requires more memory, less diversity of
|
|
state/action
pairs
|
|