Speeding Convergence
Updating sequence: start at random state
and act until it reaches absorbing goal state
For the first updating sequence and our grid
world example, how many weights get
updated from the first sequence?
What could we do if we kept the whole
sequence in memory?