CS 104: Introduction to Computer Science

•

Delayed reward:

-30K à -30K à -30K à -30K à +??K à …

•

Temporal credit assignment problem:

–

e.g. played excellent game except for one move and

lost. Which move?

•

Exploration:

–

agent can generate its own training examples

autonomously if it (1) has a model of the world, or (2)

can continuously explore its world.

–

exploration (seeing new states) vs. exploitation (doing

what looks best so far)