CS 104: Introduction to Computer Science

Experimentation Strategies

•

One extreme: Always choose action that looks best so far.

–

What can potentially happen?

–

Early bias towards positive reward experience

–

Bias against exploration for even better reward

•

Another extreme: Always choose actions randomly with

equal probability

–

Ignores what it has learned à behavior remains random

•

Want behavior between greedy and random extremes

•

Simulated annealing ideas applicable here: Start with

random behavior to gather information, gradually become

greedy to improve performance.