CS 391 - Special Topic: Machine Learning Piglet

Piglet is very much like Pig except it is played with a coin rather than a die. The object of Piglet is to be the first player to reach 10 points. Each turn, a player repeatedly flips a coin until either a "tail" is flipped or the player holds and scores the number of consecutive "heads" flipped. At any time during the player's turn, the player is faced with two choices: flip or hold. If the coin turns up tails, the player scores nothing and it becomes the opponent's turn. Otherwise, the player's turn continues. If the player chooses to hold, the number of consecutively flipped heads is added to the player's score and it becomes the opponent's
turn.

Further information on equations for Piglet can be found in the paper handout in class.

Tips:

• Make your code generic such that you can have players reach n points.  Below, the optimal solutions are available for n=3, 6, and 9.  You can then use these solutions to check your code.
• Enforce a rule that a player must hold when the turn total allows a win.
• You will still be programming with a single agent.  The agent will assume that the environment (portion of the game beyond its control) will have a player that plays with the same policy.  In other words, the agent assumes self-play.  (Question: Will this always lead to learning optimal play?)
• Extra credit: In the paper, we used a simple trick whereby we used the symmetry of the problem to simplify our equations.  We noted that the probability of a player winning when it becomes the opponent's turn is one minus the probability that the opponent will win.  However, it is possible to base our equation of a player's win in state s, V(s), on an expression involving only the probability of the same player's win in one or two future states.  You may optionally demonstrate this mathematically and use these equations in your programming for extra credit.

Optimal solutions for goal = n

From (0,0) in the upper-left corners, the row is the player score (0 - (n-1)), the column is the opponent score (0 - (n-1)), and the data at that position is the minimum turn total at which the player holds.

n = 3

1 3 3
2 2 2
1 1 1

n = 6

2 2 2 2 2 6
1 2 2 2 2 5
1 1 2 2 2 4
1 1 1 1 3 3
1 1 1 2 2 2
1 1 1 1 1 1

n = 9

2 2 2 2 2 2 3 3 9
1 2 2 2 2 2 2 3 8
1 1 2 2 2 2 2 3 7
1 1 1 2 2 2 2 2 6
1 1 1 1 2 2 2 2 5
1 1 1 1 1 2 2 2 4
1 1 1 1 1 1 1 3 3
1 1 1 1 1 1 2 2 2
1 1 1 1 1 1 1 1 1