Experimentation Strategies
One extreme: Always choose action that looks best so far.
What can potentially happen?
Early bias towards positive reward experience
Bias against exploration for even better reward
Another extreme: Always choose actions randomly with
equal probability
Ignores what it has learned à behavior remains random
Want behavior between greedy and random extremes
Simulated annealing ideas applicable here: Start with
random behavior to gather information, gradually become
greedy to improve performance.