•Set of sensors à set of
environment states
–All states fit in memory
•Set of actions à transition from
state to state, immediate reward
•Set of terminal (absorbing)
states
•Wish to learn optimal policy
–mapping of states to actions which maximizes expected reward (utility) over time
•Often called a "sequential decision problem"