Experimentation Strategies
(cont.)
Another possibility: probabilistic approach
Choose actions probabilistic such that there's
always a positive probability of choose each
action.
One example: P(ai|s) = kQ(s,ai) / sumj(kQ(s,aj))
Greater k à greater greedy exploitation
Lesser k à greater random exploration