CS 391 - Selected Topics: Game Artificial Intelligence
Each reading assignment should be completed before the class on the date
indicated. These readings are subject to change; check here for updates.
If a reading assigned in class does not match the reading assignment here,
the reading assigned in class supercedes.
1/31: Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time Analysis of the Multiarmed
Bandit Problem. (See also
here.) Sections 1 (skipping "In their classical paper ... mild
assumptions." section), 2 (special attention to the algorithms), 4, and 5.
Topics: regret, upper confidence bounds (UCBs), UCB action selection
2/2: R.S. Sutton, A. G. Barto.
Reinforcement Learning: an introduction.
Chapter 3: intro-3.3, 3.6-3.8. Topics: reinforcement learning
problem, agent, environment, state, action, reward, returns, Markov decision
processes (MDPs), state-value function V, action-value function Q, Bellman
equation, optimal policy, optimal value-functions, Bellman optimality
equation, backup diagrams.