CS 104: Introduction to Computer Science

The Problem

•

Set of sensors à set of environment states

–

All states fit in memory

•

Set of actions à transition from state to state,

immediate reward

•

Set of terminal (absorbing) states

•

Wish to learn optimal policy

–

mapping of states to actions which maximizes expected

reward (utility) over time

•

Often called a "sequential decision problem"