CS 391 - Special Topic: Machine Learning
Chapter 3


Readings Topics

Discussion Questions

Programming Assignment

HW3

Due Monday 2/10 at the beginning of class.  An improvised, informal presentation of your work may be requested in class.

Apply one of the techniques from chapter 2 to the Jack's Car Rental problem (Example 4.2, pp. 98-99).  For this, you will need to create your own environment to the specification.  Also, note that this is an associative problem, so you will need to have a simple state class.  When you start a trial, uniformly pick a random possible starting state.  Since this is a continuing problem, you will not reach a terminal (a.k.a absorbing) state.  Terminate each trial after a maximum number of steps of your choosing.  DO NOT return your agent to a naive state at the start of each trial.

Since this is an associative problem, it is recommended that you keep a table to keep track of action-value Q estimates for each possible state-action pair.  One update rule you can use following the incremental implementation pattern of section 2.5 is:

Q_{t+1}(s_t,a_t) = Q_t(s_t,a_t) + alpha * ((r_{t+1} + gamma * argmax_a Q_t(s_{t+1},a)) - Q_t(s_t,a_t))

where alpha is the stepsize and gamma is given in Example 4.2 as 0.9.   (Underscores are used here to indicate subscripts.)  This is an example of what is known as Q-learning (Chapter 6).  It is likely that you will have to have many trials to get adequate experience to approximate optimal behavior.  At the end of your all of your trials, print out the approximate optimal policy your agent has learned.  Compare this policy with that shown in Figure 4.4.  A table of numbers is adequate, although a contour plot like that of Figure 4.4 or a 3D plot (x = cars at loc. #1, y = cars at loc. #2, z = cars moved) is preferred.

Reminder:  Use/extend/implement the reinforcement base classes provided.