|
|
|
|
Introduction |
|
Minimax |
|
Alpha-beta pruning |
|
Expectiminimax |
|
|
|
|
Previously, we've looked at search problems with
static environments. (One agent
affects the environment.) |
|
Now, we generalize just a bit and allow two
agents to affect the environment in turn. à dynamic environment |
|
Previously, we've looked for a sequence of
actions to a goal state. |
|
Now, we're looking for a sequence of actions
which maximizes some utility measure regardless of how an adversarial agent
acts. |
|
|
|
|
|
Search: Peg solitaire |
|
jump a peg over another to empty space, removing
jumped peg |
|
Initial state: only one space empty |
|
Goal state: only one space occupied |
|
Find sequence of jumps from initial state to
goal |
|
|
|
|
|
Search: Tic-Tac-Toe |
|
players place X and O in turn. |
|
Initial state: empty 3´3 grid |
|
Goal state: three of a player's symbol in a row |
|
Count win = +1, draw = 0, loss = -1 |
|
Find sequence of move which maximizes utility
regardless of adversarial play |
|
|
|
|
Suppose you construct the complete tree of
possible plays. |
|
Evaluate terminal states as (+1,0,-1) |
|
Evaluate non-terminal states as maximum/minimum
of children evaluations for player X/O respectively. |
|
This propagation of evaluations is called minimax. |
|
Consider minimax on a subtree of possible
tic-tac-toe plays… |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Problem definition: initial state, operators,
terminal test, utility (or payoff) function |
|
Given whole game tree, minimax yields perfect
decisions* |
|
Minimax: minimum of the maximum of the minimum
of the maximum of the… |
|
*assuming adversary acts according to minimax à
importance of player modeling |
|
Can’t search whole game tree, so… |
|
|
|
|
|
Evaluate states passing cutoff-test according to
heuristic evaluation function |
|
Consider Chess |
|
enormous state space |
|
can't possibly search whole tree with current
computational limitations |
|
Must |
|
limit depth of search |
|
evaluate non-terminal nodes at limit |
|
|
|
|
How would you evaluate these positions? |
|
Material advantage isn't the whole story. |
|
|
|
|
|
A good evaluation function |
|
returns actual value at terminal states, |
|
approximates actual value at non-terminal nodes,
and |
|
isn't too computationally intensive |
|
Most attribute recent game-playing success to
better speed ("brute force") rather than better evaluation
(knowledge base) |
|
Still, most minimax search is pointless… |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
While we search the tree, we can keep track of
guaranteed maximum/minimum utilities if play proceeds to each node. |
|
When we see a contradiction in guarantees, we
can prune remaining children from further consideration, because we've
proven a rational player will never reach that node. |
|
|
|
|
What if other player isn't rational? |
|
If evaluation is perfect, then one can always do
as well if not better against an irrational player with rational play. |
|
|
|
|
|
Let a,b be local lower,upper bound guarantees |
|
"If play proceeds here, root will score at
least a." |
|
"If play proceeds here, root will score at
most b." |
|
Pruning thus according to a and b is called alpha-beta
pruning. |
|
Minimax search with alpha-beta pruning is
sometimes called alpha-beta search. |
|
|
|
|
|
|
|
|
|
|
A chance node is evaluated as follows |
|
the value of each child is multiplied times the
probability of reaching that child |
|
these products are then summed. |
|
Disadvantages to this approach: |
|
branching factor of chance nodes can be large! |
|
no pruning allowed |
|
evaluation functions are hard!… |
|
|
|