© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Further Reading: Temporal Difference Learning
•A generalization of Q-learning with nondeterminism
•Basic idea:  If you use a sequence of actions and rewards, you can write a more general learning rule blending estimates from lookahead of different depths.
•Tesauro (1995) trained TD-GAMMON on 1.5 million self-generated games to become nearly equal to the top-ranked players of international backgammon tournaments.