Homework #4

CS 391 Selected Topics: Game AI
Homework #4

Due the beginning of class on Tuesday 3/6. (More time → larger assignment weight)

Note: This work is to be done in groups of 2. Each group will submit one assignment. Although you may divide the work, both team members should be able to present/describe their partner's work upon request.

0. HW4 Preparation: Save and compile the HW4 starter code that will be emailed through the class mailing list. Run the OptimalPlayer code once to compute and export an optimal policy data file.

1. Function Approximation for Optimal Fowl Play: Use two different function approximation techniques to create a low-memory means of computing approximately optimal play of the game Fowl Play for 2 players. You must implement at least one of the techniques. Otherwise, you are free to use (and credit) any open source implementations you find online.

Fowl Play is a simple jeopardy card game. For the 2 player game, the first player to score 50 points wins. The deck contains 42 chicken cards and 6 wolf cards.

A player's choice is always simply whether to "draw" (i.e. deal) a card or to "hold" and end the turn.
If the player chooses to "draw" there are two possible outcomes:
- CHICKEN - the turn total increases by 1, and the turn continues, or
- WOLF - the turn total resets to 0, and the turn ends with the score unchanged.
If the player chooses to "hold", the score is increased by the turn total, the turn total resets to 0, and turn ends.
If the last (sixth) wolf card is drawn, all cards are gathered and reshuffled before the next turn.
An optimal player chooses to "draw" or "hold" so as to maximize the probability of winning.

Problem motivation: Having computed optimal play for Fowl Play, we would like to create an implementation suitable for a small, memory-limited device (e.g. Lego Mindstorm robot, Arduino microcontroller, or smart phone app). Memory limits are assumed to preclude the direct use of the computed policy. Given the policy, we would like to use forms of function approximation to effectively "compress" policy information while still retaining high-quality play.

Representation: The game state may be described by 5 variables: current player score (i), opponent score (j), turn total (k), wolves drawn since the last shuffle (w), and chickens drawn since the last shuffle (c). Actions are simple DRAW (1) and HOLD (0).

So which function(s) should I approximate? That is your decision. There are a number of straightforward possibilities:

Approximate V(i, j, k, w, c), the state value function. Using a form of the Bellman optimality equations, one can look forward one action to the next possible states and determine which action maximizes expected V.
Approximate Q(i, j, k, w, c, a), the action value function. A is DRAW (1) or HOLD (0).
Approximate the policy function π(i, j, k, w, c) that maps to DRAW (1) and HOLD (0). The problem then becomes a classification problem, in that you're seeking to classify states as DRAW or HOLD states.
Approximate a function draws(i, j, w, c) derived from π(i, j, k, w, c) that maps the state at the beginning of the turn (when the turn total is 0) to the total number of DRAW actions taken before a HOLD action. That is, the minimum k for which π(i, j, k, w, c+k) maps to HOLD.

In the first two value function scenarios, there is the advantage that a softmax action selection policy can further be applied to offer different player skill levels according to the underlying chosen temperature parameter τ. You have access to all of these optimal play functions through the OptimalPlayer class supplied.

So which function approximation techniques am I permitted to apply?

This too is your decision. There are again a number of possibilities:

Coarse coding
Radial basis functions
A multilayer feed-forward neural networks with a single hidden later consisting of few nodes
k-nearest-neighbor classification for a relatively small number of neighbor points.
Decision-tree classification
Linear and logistic regression on state parameters/features
Support vector machines
Optimized manually constructed function approximators as seen in:
- Todd W. Neller and Clifton G.M. Presser. Practical Play of the Dice Game Pig, The UMAP Journal 31(1) (2010), pp. 5-19
Etc.

We will discuss a few of these in class, but there are ample online resources for learning about each.

What's supplied in the starter code?

FowlPlayPlayer - This is the abstract class you must extend for creating your own player. You are provided three example implementations:
- HumanPlayer - a text-based user interface for playing Fowl Play
- OptimalPlayer - the optimal player for Fowl Play. Due to the size of the policy information, the OptimalPlayer will take a while to initialize. When it is first used, it will compute optimal policy and export it to a file. In subsequent uses, it will simply load the large policy file.
- MaxScorePlayer - a simple player that maximizes the expected number of point scored per turn. Think of this player as your weak benchmark. You can do better.
FowlPlayGame - A text-based game for testing and evaluating players
FowlPlaySimulator - test the relative quality of players quickly through simulation.
FowlPlayEvaluator - test the relative quality of players slowly and more precisely through policy iteration.

How cool is that!? I get to choose my learning goals and be creative with an open engineering challenge!

That's right, it's very cool. Assignment(s) judged as superior may find a possibility for joint publication and/or fun prototyping on one of the platforms mentioned.

One caveat: I recommend trying something simple first and soon. Remember the KISS principle. You'll want to get something to outperform the MaxScorePlayer. Of course, you will not get anything to outperform the OptimalPlayer.

How close can you get to optimal play while keeping the code that computes the action short? For example: Consider an approach where you use an open source neural network package (e.g. Neuroph) or write your own neural network. Once you've trained a small network on one of the functions, the relatively few weights can be hard-coded in 2D arrays, and simple feed-forward computation can compute the intended play action. The code and memory requirements for the computation of the function approximator can use all the memory and time that you wish. The actual implementation of the player using the function approximator should be (1) relatively simple, (2) fast, (3) not require file I/O, and (4) exceed performance of the MaxScorePlayer.