 |
 |
 |
 |
 |
 |
 |
 |
• |
Another
possibility: probabilistic approach
|
|
|
• |
Choose
actions probabilistic such that there's
|
|
|
always
a positive probability of choose each
|
|
|
|
action.
|
|
|
• |
One example: P(ai|s) = kQ(s,ai) / sumj(kQ(s,aj))
|
|
|
|
|
• |
Greater
k à greater greedy exploitation
|
|
|
• |
Lesser
k à greater random exploration
|
|