CS 371 - Introduction to Artificial Intelligence
4th Hour Project: Open AI Gym Q-Learning

Due:

Part 1: Office hours or scheduled meeting during week 7 (2/28-3/4) to demonstrate your working part 1 implementation.

Part 2: Office hours or scheduled meeting during week 10 (3/28-4/1) to demonstrate your working part 2 implementation and discuss part 3 work options.

Part 3:

Office hours or scheduled meeting during week 13 (4/18-4/22) to demonstrate your working part 3 implementation.

During class 27 (4/26), provide a brief, informal presentation and demonstrations of part 3 work

Note: You may work on this in either pairs or independently for separate grades. No groups of 3 or more are permitted.

Additional Resources

David Silver introduction to Reinforcement Learning (RL)
Reinforcement Learning: an introduction (2nd ed) by R. Sutton and A. Barto. This is an outstanding free book for understanding the foundational ideas of field of Reinforcement Learning. Download the PDF!

Part 1: Deeplizard Frozen Lake

Implement basic Q-learning through the Deeplizard Frozen Lake tutorial:

Install Python 3 and OpenAI Gym on your computer. You may use any IDE, but you can find installation instructions for some in the editable Google doc link I sent by email.
Follow the step-by-step deeplizard tutorial through step 10 (short of deep Q-learning).
As challenged in the video, tune the learning parameters (without changing num_episodes or max_steps_per_episode) so as to get a consistent success rate above 0.7. It is possible, just by tuning min_exploration_rate and exploration_decay_rate.
I'll have you demonstrate the Q-learning and agent performance as shown in the videos.
UPDATE:OpenAI Gym now supports a different version of Frozen Lake. This requires a few differences in the tutorial code:
- env = gym.make("FrozenLake-v0") → env = gym.make("FrozenLake-v1")
- env.render() → print(env.render("ansi"))

Part 2: Approach n

Download my Approach-n Environment for OpenAI Gym and unzip it into your OpenAI gym work directory.
(You can optionally learn about how to create your own environments by reading this and this along with OpenAI Gym documented open-source code such as gym/core.py.)
To install this into your Python "gym" OpenAI Gym environment, first activate your "gym" Python environment, and then enter the command "pip install -e" from the root directory of this unzipped new environment. See the linked documentation in the previous bullet for more details.

Documentation:

# Approach n OpenAI Gym Environment

The dice game "Approach n" is played with 2 players and a single standard 6-sided
 die (d6). The goal is to approach a total of n without exceeding it.  The first
 player roll a die until they either (1) "hold" (i.e. end their turn) with a
 roll sum less than or equal to n, or (2) exceed n and lose.  If the first
 player holds at exactly n, the first player wins immediately. If the first
 player holds with less than n, the second player must roll until the second
 player's roll sum (1) exceeds the first player's roll sum without exceeding n
 and wins, or (2) exceeds n and loses.  Note that the first player is the only
 player with a choice of play policy.  For n >= 10, the game is nearly fair,
 i.e. can be won approximately half of the time with optimal decisions.

- Non-terminal states: Player 1 turn totals less than n (n is a terminal state)
- Actions: 0 (HOLD), 1 (ROLL)
- Rewards: +1 for transition to terminal state with win,
           -1 for transition to terminal state with loss,
            0 otherwise
- Transitions: Each die roll of 1 - 6 is equiprobable

This is a simpler environment than Frozen Lake. Read on for differences in how you will approach it:

These are the fewer initial required imports.

import gym
import gym_approach_n
import random
import time
import numpy as np
from IPython.display import clear_output

Initialize the environment and some of your parameters.

env = gym.make("approach-n-v1")

env.n = n = 10 # You can try out different value of n here.
action_space_size = 2 # 0=HOLD, 1=ROLL
state_space_size = n + 1 # nonterminal Player 1 totals 0 through
                         # (n - 1); terminal state is n

This line is handy for summarizing your policy.

print('Policy: Hold at {}\n'.format(', '.join([str(i) for i in range(n) if q_table[i,0] > q_table[i,1]])))

When demonstrating your agent's play, you'll benefit from longer delays between text-rendering updates.

Optional Challenge: Using the default Q-learning parameters of the tutorial, the agent usually learns the optimal "Hold at 8, 9" policy for n=10. (A total of 10 is an automatic win.) However, it will sometimes learn "Hold at 7, 8, 9" or "Hold at 9". What are ways you can improve the learning so as to more reliably learn the optimal policy?

Part 3: Choose Your Own Adventure

Demonstrate a form of Q-learning for a third problem of your choice that adds something to your understanding of and experience with Q-learning. Suggestions:

Continue with deeplizard tutorials into the Deep Q-Learning and Deep Q-Networks material.
Work with a classic continuous control problem included with OpenAI Gym
Create your own environment for a simple RL problem. (See Sutton and Barto's text for some good ideas.)

This work will be shared in the 2nd-to-last class of the semester.