CS 104: Introduction to Computer Science

The Problem

•Set of sensors à set of environment states

–All states fit in memory

•Set of actions à transition from state to state, immediate reward

•Set of terminal (absorbing) states

•Wish to learn optimal policy

–mapping of states to actions which maximizes expected reward (utility) over time

•Often called a "sequential decision problem"