CS 104: Introduction to Computer Science

Experimentation Strategies

•One extreme: Always choose action that looks best so far.

–What can potentially happen?

–Early bias towards positive reward experience

–Bias against exploration for even better reward

•Another extreme: Always choose actions randomly with equal probability

–Ignores what it has learned à behavior remains random

•Want behavior between greedy and random extremes

•Simulated annealing ideas applicable here: Start with random behavior to gather information, gradually become greedy to improve performance.

–