© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
The Problem
•Set of sensors à set of environment states
–All states fit in memory
•Set of actions à transition from state to state, immediate reward
•Set of terminal (absorbing) states
•Wish to learn optimal policy
–mapping of states to actions which maximizes expected reward (utility) over time
•Often called a "sequential decision problem"