Example: Pig Dice Game
In turn, players roll a single die as many times as desire.
If a player stops before rolling a 1, the player adds the total of the
numbers rolled in sequence to their cumulative score.
If a player rolls a 1, the player receives no score.
The goal is to be the first player to reach a score of 100.
Qn(s,a) ß (1-an)Qn-1(s,a) + an(r + g maxa'[Qn-1(s',a')])
an = 1/(1+visitsn(s,a))
What are the ramifications of one's choice for g?
How can one best speed convergence observing real game
experience?