|
|
|
|
|
|
|
|
|
|
|
|
|
• |
Set
of sensors à set of environment states
|
|
|
|
– |
All
states fit in memory
|
|
|
• |
Set
of actions à transition from state to state,
|
|
|
immediate
reward
|
|
|
• |
Set
of terminal (absorbing) states
|
|
|
• |
Wish
to learn optimal policy
|
|
|
|
– |
mapping
of states to actions which maximizes expected
|
|
reward
(utility) over time
|
|
|
• |
Often
called a "sequential decision problem"
|
|