•Another possibility:
probabilistic approach
•Choose actions
probabilistic such that there's always a
positive probability of choose each action.
•One example: P(ai|s) = kQ(s,ai) / sumj(kQ(s,aj))
•Greater k à greater greedy exploitation
•Lesser k à greater random exploration