CS 391 Selected Topics: Game AIHomework #2

Due the beginning of class on Thursday 2/9.

Note: This work is to be done in groups of 2.  Each group will submit one assignment.  Although you may divide the work, both team members should be able to present/describe their partner's work upon request.

0. HW2 Preparation: Download the HW2 starter code here.  Read Sutton and Barto's description of the Value Iteration algorithm in section 4.4.

Game box: Game board:

The board c consists of 100 numbered squares arranged in a 10-by-10 grid. The winner is the first player to reach square 100. Players start with their playing pieces off the board near square one.  (Imagine starting at square 0.)  Players take turns spinning a spinner with numbers 1 - 6. After spinning, one advances one's piece according to the number of the spin. (Exception: If this would take one beyond square 100, one does not advance at all. Square 100 must be reached by exact count.) Some squares depict the beginning of a chute or ladder. After advancing to such a space, one slides the piece down the chute or climbs the piece up the ladder. If, after advancing and/or climbing, a player's piece is on square 100, the player wins and the game is over. Here is a table describing the chutes and ladders of the 2004 edition:

From Square To Square
1 38
4 14
9 31
16 6
21 42
28 84
36 44
48 26
49 11
51 67
56 53
62 19
64 60
71 91
80 100
87 24
93 73
95 75
98 78

Using Value Iteration, compute the expected number of turns remaining in solitaire Chutes and Ladders from each board position, printing the result according to given format of the starter code.

Representation:

• States: Your value function V[i] represents the expected future rewards from board position i before the spinner is spun.  Use discount gamma = 1.0 (i.e. no discount) for all exercises.
• Action: There is a single "spin" action.  (Standard Chutes and Ladders is better described as an "activity" than a "game".)  After the spin action at position p, the player gets a spin s of 1 through 6 with equal probability, moves to goesTo[p + s] as defined in the starter code.  (In the case of chutes and ladders, the transition is made.  Otherwise, goesTo[p + s] simply maps to p + s.)  The one exception is where p + s would take one past the goal position 100.  In that case, the position remains the same.
• Reward: In later exercises, we will want to minimize the expected number of turns to the goal, so each turn is equally undesirable.  We thus assign each action at a non-goal position a reward of -1.  Thus, with discount 1.0 (no discount), V[i] will represent the negated expected number of turns to the goal.

Self-check:

Highlight the following to see if your first line of output is correct: Expected turns per game: 39.60

2. Boosted Chutes and Ladders:  For Boosted Chutes and Ladders, we modify the game to allow a play a simple choice each turn: After the spin of s the player may choose to either move s or (s + 1) positions, and thus transition to either position goesTo[p + s] or position goesTo[p + s + 1].  (Goal movement exceptions still apply.)

Assuming an optimal player that maximizes V and thus minimizes turns, use Value Iteration to compute the expected number of turns remaining in solitaire Boosted Chutes and Ladders from each board position, printing the result in the same format as above.

• States: Your value function V[i] respresent the expected future rewards as before.  However, bear in mind that the spin now takes place before the action.   Thus, you may benefit from representing each state as a (position, spin) pair.  If you choose this representation, V[i] then is computed as the average state value for states (i, 1), (i, 2), ... (i, 6).  The representation is your choice.
• Actions: One can either increase the spin by 1 (e.g. a spin of 6 becomes 7) or use the unmodified spin.  The behavior that follows is as before.
• Reward: As before, there is a reward of -1 for each non-terminal state action (i.e. each turn).

Self-check:

Highlight the following to see if your first line of output is correct: Expected turns per game: 14.50

3. Token Boosted Chutes and Ladders:  We modify standard Chutes and Ladders by giving a player three tokens.  After a spin, the player may spend a token to "boost" the spin as in the previous variant.  In other words, a player is permitted three boosts over the course of the game.

Assuming an optimal player that maximizes V and thus minimizes turns, use Value Iteration to compute the expected number of turns remaining in solitaire Token Boosted Chutes and Ladders from each board position, printing the result in the same format as above, but printing separate boards for each possible number of remaining tokens.

Representation:

• States: States are as in the prior problem, except that your state description must now include the number of remaining tokens.
• Actions: If there is at least one token remaining, the player may choose to "boost" a spin or accept the spin as-is.  Otherwise, there is a single action to accept the spin as-is.  Note that choosing to boost additionally affects the state by decrementing the number of remaining tokens.
• Reward: As before, there is a reward of -1 for each non-terminal state action (i.e. each turn).

Self-check:

Highlight the following to see if your first line of output is correct: Expected turns per game: 18.25

Note that you are not being asked to explicitly compute or print optimal policy in the last two exercises.  A player could look at these tables and infer the optimal actions by taking the action that would minimize the expected number of turns to the goal.