CS 371: Introduction to
Artificial Intelligence
The Problem
Differences from Function
Approximation
The Learning Task
Reward Functions to
Optimize
Simple Grid World Example
Q Learning
How to Learn Q
How to Learn Q
Q Learning Algorithm
Grid World Q Learning
Example
Conditions for
Convergence
Experimentation
Strategies
Experimentation
Strategies (cont.)
Speeding Convergence
Speeding Convergence
(cont.)
Speeding Convergence
(cont.)
Nondeterministic Rewards
and Actions
Update Rule for
Nondeterministic Case
Conditions for Convergence
Example: Pig Dice Game
Example: Simplified
Blackjack
Further Reading: Temporal
Difference Learning