CS 104: Introduction to Computer Science

Differences from Function Approximation

•Delayed reward:

-30K à -30K à -30K à -30K à +??K à …

•Temporal credit assignment problem:

–e.g. played excellent game except for one move and lost. Which move?

•Exploration:

–agent can generate its own training examples autonomously if it (1) has a model of the world, or (2) can continuously explore its world.

–exploration (seeing new states) vs. exploitation (doing what looks best so far)