•Training using updating sequence in reverse order
speeds convergence.
•Tradeoff: Requires more memory
•Suppose exploration and learning cost great time/expense.
•Can retrain on same data repeatedly.
•Ratio of old/new update sequences a matter of relative costs for problem domain.
•Tradeoff: Requires more memory, less diversity of state/action pairs