|
CS 391 - Special Topic: Machine Learning
Piglet |
Piglet is very much like Pig except it is played with a coin rather
than a die. The object of Piglet is to be the first player to reach 10 points. Each turn, a player repeatedly flips a coin until either a
"tail" is flipped or the player holds and scores the number of consecutive
"heads" flipped. At any time during the player's turn, the player is faced with two choices:
flip or hold. If the coin turns up tails, the player scores nothing and it becomes the
opponent's turn. Otherwise, the player's turn continues. If the player chooses to
hold, the number of consecutively flipped heads is added to the player's score and it becomes the opponent's
turn.
Further information on equations for Piglet can be found in the paper handout
in class.
Tips:
- Make your code generic such that you can have players reach n
points. Below, the optimal solutions are available for n=3, 6,
and 9. You can then use these solutions to check your code.
- Enforce a rule that a player must hold when the turn total allows a win.
- You will still be programming with a single agent. The agent will
assume that the environment (portion of the game beyond its control) will
have a player that plays with the same policy. In other words, the
agent assumes self-play. (Question: Will this always lead to learning
optimal play?)
- Extra credit: In the paper, we used a simple trick whereby we used
the symmetry of the problem to simplify our equations. We noted that
the probability of a player winning when it becomes the opponent's turn is
one minus the probability that the opponent will win. However, it is
possible to base our equation of a player's win in state s, V(s),
on an expression involving only the probability of the same player's win in
one or two future states. You may optionally demonstrate this
mathematically and use these equations in your programming for extra credit.
Optimal solutions for goal = n
From (0,0) in the upper-left corners, the row is the player score (0 - (n-1)),
the column is the opponent score (0 - (n-1)), and the data at that
position is the minimum turn total at which the player holds.
n = 3
1 3 3
2 2 2
1 1 1
n = 6
2 2 2 2 2 6
1 2 2 2 2 5
1 1 2 2 2 4
1 1 1 1 3 3
1 1 1 2 2 2
1 1 1 1 1 1
n = 9
2 2 2 2 2 2 3 3 9
1 2 2 2 2 2 2 3 8
1 1 2 2 2 2 2 3 7
1 1 1 2 2 2 2 2 6
1 1 1 1 2 2 2 2 5
1 1 1 1 1 2 2 2 4
1 1 1 1 1 1 1 3 3
1 1 1 1 1 1 2 2 2
1 1 1 1 1 1 1 1 1