CS 391 - Special Topic: Machine Learning
Chapter 1 |
What is reinforcement learning (RL)?
What is supervised learning and how is it different from RL?
What is the exploration-exploitation tradeoff and why is it important?
What are examples of reinforcement learning applications? (See
also the case studies of Ch. 11)
Imagine and describe a potential reinforcement learning application
not mentioned in chapter 1 or 11.
What are the four main parts of a RL system?
What is a policy? What kind of policy does a RL agent seek to
learn?
What is a reward function? What would be the biological analogue?
What is a value function? How does it differ from the reward
function? How are they related?
What is a model? How do planning techniques use models differently?
Describe the Tic-Tac-Toe example in terms of these parts.
What is a greedy policy? Why can a RL agent learn suboptimal
behavior with a greedy policy?
Write and explain the Tic-Tac-Toe example value function update.
Rewrite the update with the right-hand side having only a single V(s) term.
What is the parameter alpha, what values can alpha have, and how does it
affect learning?
Why is this a temporal-difference learning method?
RL is not limited to Tic-Tac-Toe. In what respects is it more
generally applicable?
Optional:
Overview the history of reinforcement learning, describing the three
main threads and the most important and influential works.
Discuss possible RL project ideas for later this semester.
Programming Assignment
None. (See reading above.)