Speeding Convergence (cont.)
Training using updating sequence in reverse
order speeds convergence.
Tradeoff: Requires more memory
Suppose exploration and learning cost great
time/expense.
Can retrain on same data repeatedly.
Ratio of old/new update sequences a matter of
relative costs for problem domain.
Tradeoff: Requires more memory, less diversity of
state/action pairs