![]() |
CS 391 Selected Topics: Game AI Homework #3 |
Note: This work is to be done in groups of 2. Each group will submit one assignment. Although you may divide the work, both team members should be able to present/describe their partner's work upon request.
0. HW3 Preparation: Download the HW3 starter code here.
1. TD(0) SARSA Learning of Piglet Solitaire: Piglet Solitaire is a simple jeopardy coin game. The goal is to reach a given goal score in a given number of turns. For this exercise, the goal score is 6 and the number of turns is 10.
Implementing the PigletSolitairePlayer interface, create a player that, at time of construction, simulates play and computes an approximately optimal policy using the SARSA algorithm. Your algorithm should complete its computation within one minute on our lab machines. You will test your implementation using the PigSolitairePlayerEvaluator class.
Self-check:
Highlight the following to see the output for an optimal player: p[0][0][0] = 0.5418925201520324
Note that your policy will likely not be optimal. Rather, it should approximate optimal play. You are free to choose a policy underlying SARSA that is not epsilon-greedy. This policy may dynamically adjust behavior as training progresses. You may also experiment with varying the learning rate. However, the algorithm should essentially be SARSA. How close to optimal can you get?
The submission should have the PigletSolitairePlayerEvaluator modified to evaluate your implementation rather than the included hold-at-2 player (which is not a bad policy).