© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Differences from Function Approximation
•Delayed reward:
–-30K à -30K à -30K à -30K à +??K à …
•Temporal credit assignment problem:
–e.g. played excellent game except for one move and lost.  Which move?
•Exploration:
–agent can generate its own training examples autonomously if it (1) has a model of the world, or (2) can continuously explore its world.
–exploration (seeing new states) vs. exploitation (doing what looks best so far)