CS 371 - Introduction to Artificial Intelligence
4th Hour Project: Open AI Gym Q-Learning


Due:

Note: You may work on this in either pairs or independently for separate grades. No groups of 3 or more are permitted.

Additional Resources

Part 1: Deeplizard Frozen Lake

Implement basic Q-learning through the Deeplizard Frozen Lake tutorial:

Part 2: Approach n

These are the fewer initial required imports.
import gym
import gym_approach_n
import random
import time
import numpy as np
from IPython.display import clear_output
Initialize the environment and some of your parameters.
env = gym.make("approach-n-v1")

env.n = n = 10 # You can try out different value of n here.
action_space_size = 2 # 0=HOLD, 1=ROLL
state_space_size = n + 1 # nonterminal Player 1 totals 0 through
                         # (n - 1); terminal state is n
This line is handy for summarizing your policy.
print('Policy: Hold at {}\n'.format(', '.join([str(i) for i in range(n) if q_table[i,0] > q_table[i,1]])))

When demonstrating your agent's play, you'll benefit from longer delays between text-rendering updates.

Optional Challenge: Using the default Q-learning parameters of the tutorial, the agent usually learns the optimal "Hold at 8, 9" policy for n=10. (A total of 10 is an automatic win.) However, it will sometimes learn "Hold at 7, 8, 9" or "Hold at 9". What are ways you can improve the learning so as to more reliably learn the optimal policy?

Part 3: Choose Your Own Adventure

Demonstrate a form of Q-learning for a third problem of your choice that adds something to your understanding of and experience with Q-learning. Suggestions:

This work will be shared in the 2nd-to-last class of the semester.