CS 104: Introduction to Computer Science


•	A generalization of Q-learning with
	nondeterminism

•	Basic idea: If you use a sequence of actions and
	rewards, you can write a more general learning
	rule blending estimates from lookahead of
	different depths.

•	Tesauro (1995) trained TD-GAMMON on 1.5
	million self-generated games to become nearly
	equal to the top-ranked players of international
	backgammon tournaments.