CS 371: Introduction to
Artificial Intelligence
Introduction to Uncertain
Reasoning
|
|
|
|
Qualification problem in FOL |
|
e.g. car doesn’t start |
|
Problems: |
|
no exceptions? Þ big rules! |
|
no knowledge of likelihood of each
exceptions |
|
no complete theory (e.g. medical
science) |
|
even given complete rules, sometime we
only have partial evidence |
Probability to the
Rescue!
|
|
|
|
Possible solution: probabilities |
|
summarizes uncertainty |
|
gives likelihood information,
incomplete theory can be refined, can handle partial evidence, but… |
|
rules can still be big Þ stay tuned
for simplifying assumptions |
|
How might an agent use probabilities to
choose actions given percepts? |
Decision-Theoretic Agent
|
|
|
|
Iterate: |
|
Update evidence with current percept |
|
Compute outcome probabilities for
actions |
|
Select action with maximum expected
utility given probable outcomes |
|
utility - the quality of being useful |
|
decision theory = probability theory +
utility theory |
Prior Probabilities
|
|
|
unconditional or prior probabilities –
probabilities without prior information (i.e. before evidence). |
|
P(A) is the probability of A in the
absence of other information. |
|
Suppose we have a discrete random
variable Weather that can take on 4 values: Sunny, Rainy, Cloudy, or Snowy. |
|
How to form prior probabilities? |
Forming Prior
Probabilities
|
|
|
|
In absence of any information at all,
we might say all outcomes are equally likely. |
|
Better, however to apply some knowledge
to choice of prior probabilities (e.g. weather statistics over many years). |
|
P(Weather) = <0.7, 0.2, 0.08,
0.02> |
|
(probability distribution over random
variable Weather) |
|
What about low probability events that
have never happened or happen too infrequently to have accurate statistics? |
Where do Probabilities
Come From?
|
|
|
Frequentist view – probabilities from
experimentation |
|
Objectivist view – probabilities real
values frequentists approximate |
|
Subjectivist view – probabilities
reflect agent degrees of belief |
Conditional Probabilities
|
|
|
conditional or posterior probabilities
– probabilities with prior information (i.e. after evidence) |
|
P(A|B) is the probability of A given
that all we know is B. |
|
P(Weather=Rainy|Month=April) |
|
Is P(BÞA) equal to P(A|B)? |
|
Product Rule: P(AÙB) = P(A|B)P(B) |
Axioms of Probability
|
|
|
All probabilities are between 0 and 1. |
|
Necessarily true and false propositions
have probability 1 and 0, respectively. |
|
The probability of a disjunction is
given by P(AÚB) = P(A) + P(B) - P(AÙB) |
|
From these three axioms, all other
properties of probabilities can be derived. |
Why Are These Axioms
Reasonable?
|
|
|
de Finetti’s betting argument: Put your
money where your beliefs are. |
|
If agent 1 has a set of beliefs
inconsistent with the axioms of probability, then there exists a betting
strategy for agent 2 that guarantees that agent 1 will lose money. |
|
practical results have made an even
more persuasive arguments (e.g. Pathfinder medical diagnosis) |
Joint Probability
Distribution
|
|
|
Atomic event - an assignment of values
to variable; a specific state of the world |
|
For simplicity, we'll treat all
variables as Boolean (e.g. P(A), P(ØA), P(A^B)) |
|
Joint probability P(X1,X2,…,Xn) - a
function mapping atomic events to probabilities for atomic events |
Joint Probability Example
|
|
|
What's the probability of having a
cavity given the evidence of a toothache? |
|
Like a lookup table for probabilities:
can easily have too many entries for practical entry Þ motivation
for conditional probabilities |
Bayes’ Rule
|
|
|
|
Bayes’ Rule underlies all modern AI
systems for probabilistic inference |
|
two forms of product rule: |
|
P(AÙB) = |
|
P(AÙB) = |
|
Now use these two to form an equation
for: |
|
P(B|A) = |
Bayes’ Rule
|
|
|
|
Bayes’ Rule underlies all modern AI
systems for probabilistic inference |
|
two forms of product rule: |
|
P(AÙB) = P(A|B) P(B) |
|
P(AÙB) = P(B|A) P(A) |
|
Now use these two to form an equation
for: |
|
P(B|A) = P(A|B) P(B) / P(A) |
Applying Bayes’ Rule
|
|
|
|
What's Bayes' Rule good for? Need three terms to compute one! |
|
Often you only have the three and need
the fourth. |
|
Example: |
|
M = patient has meningitis |
|
S = patient has stiff neck |
Applying Bayes’ Rule
(cont.)
|
|
|
|
Given: |
|
P(S|M) = 0.5 |
|
P(M) = 1/50000 |
|
P(S) = 1/20 |
|
What's the probability that a patient
with a stiff neck has meningitis? |
Applying Bayes’ Rule
(cont.)
|
|
|
|
Given: |
|
P(S|M) = 0.5 |
|
P(M) = 1/50000 |
|
P(S) = 1/20 |
|
What's the probability that a patient
with a stiff neck has meningitis? |
|
P(M|S) = P(S|M) P(M) / P(S) |
|
= 0.5 * (1/50000) / (1/20) |
|
= 0.5 * 20 / 50000 = 10/50000 = 1/5000 |
Relative Likelihood
|
|
|
|
Now suppose we don't know the
probability of a stiff neck, but we do know: |
|
the probability of whiplash P(W) =
(1/1000) |
|
the probability of a stiff neck given
whiplash P(S|W) = 0.8 |
|
What is the relative likelihood of
meningitis and whiplash given a stiff neck? |
|
Write Bayes' Rule for each and write
P(M|S)/P(W|S) |
Relative Likelihood
|
|
|
|
Now suppose we don't know the
probability of a stiff neck, but we do know: |
|
the probability of whiplash P(W) =
(1/1000) |
|
the probability of a stiff neck given
whiplash P(S|W) = 0.8 |
|
What is the relative likelihood of
meningitis and whiplash given a stiff neck? |
|
Write Bayes' Rule for each and write
P(M|S)/P(W|S) |
|
P(M|S)/P(W|S) = (P(S|M)P(M)/P(S)) /
(P(S|W)P(W)/P(S)) |
|
= (P(S|M) P(M))/(P(S|W) P(W)) |
|
= (0.5*(1/50000))/(0.8*(1/1000)) |
|
= .00001 / .0008 = 1/80 |
Normalization
|
|
|
Write Bayes' Rule for P(M|S) |
|
Now write Bayes' Rule for P(ØM|S) |
|
We know P(M|S) + P(ØM|S) = 1 |
|
Use these to write a new expression for
P(S) |
|
Substitute this expression in Bayes'
Rule for P(M|S) |
|
One does not need P(S) directly. |
Normalization (cont.)
|
|
|
|
The main point however, is that 1/P(S)
is a normalizing constant that allows conditional terms to sum to one. |
|
P(M|S) = a P(S|M) P(M) |
|
where a = 1/P(S) is a normalizing constant such that P(M|S) + P(ØM|S) = 1 |
Conditional Independence
|
|
|
|
What's the probability of my having a
cavity given that I stubbed my toe? |
|
Often, there is no direct causal link
between two things: |
|
direct: burglary à alarm cavity à toothache |
|
disease à
symptom defect à failure |
|
indirect: burglary à alarm
company calls |
|
cavity à dentist
called about toothache |
|
disease à symptom
noted |
|
defect à failure
caused by failure |
Conditional Independence
(cont.)
|
|
|
The size of a table for a joint
probability distribution can easily become enormous (exponential in number of
variables). |
|
How can one represent a joint
probability distribution more compactly? |
Belief Networks
|
|
|
|
Assume variables are conditionally
independent by default. |
|
Only represent direct causal links
(conditional dependence) between random variables. |
|
Belief network or Bayesian network: |
|
set of random variables (nodes) |
|
set of directed links (edges)
indicating direct influence of one variable on another. |
|
a table for each variable, supplying
conditional probabilities of the variable for each assignment of its parents |
|
no directed cycles (network is a DAG) |
Choosing Variables
|
|
|
From Cooper [1984]:
"Metastatic cancer is a possible cause of a brain tumor and is also an
explanation for increased total serum calcium. In turn, either of these could explain a patient falling into a
coma. Severe headache is also
possible associated with a brain tumor." |
|
What are our variables? |
|
What are the direct causal influences
between them? |
Identifying Direct
Influences
|
|
|
|
Let: |
|
A = Patient has metastatic cancer |
|
B = Patient has increased total serum
calcium |
|
C = Patient has a brain tumor |
|
D = Patient lapses occasionally into
coma |
|
E = Patient has a severe headache |
|
What are the direct causal links
between these variables?
"Metastatic cancer is a possible cause
of a brain tumor and is also an explanation for increased total serum
calcium. In turn, either of these
could explain a patient falling into a coma.
Severe headache is also possible associated with a brain tumor." |
|
Draw the belief net. |
Conditional Probability
Tables (CPTs)
Probabilistic Reasoning
|
|
|
From the joint probability distribution,
we can answer any probability query. |
|
From the conditional (in)dependence
assumptions and CPTs of the belief network, we can compute the joint
probability distribution. |
|
Therefore, a belief network has the
probabilistic information to answer any probability query. |
|
How do we compute the joint probability
distribution from the belief network? |
Computing Joint
Probabilities with CPTs
|
|
|
Denote our set of variables as X1, X2,
…, Xn. |
|
The joint probability distribution
P(X1,…,Xn) can be thought of as a table with entries P(X1=x1,…,Xn=xn) or
simply P(x1, …, xn) where x1,…,xn is a possible assignment to all variables. |
|
Using CPTs,
P(x1, …, xn) =
P(x1|ParentValues(x1)) * … *
P(xn|ParentValues(xn)) |
Joint Probability
Computation Example
Markov Blanket
|
|
|
Suppose we want to know the probability
of each variable's values given all other variable values. |
|
Recall P(x1, …, xn) =
P(x1|ParentValues(x1)) * … *
P(xn|ParentValues(xn)) |
|
In computing P(x1, …, xi, …, xn), which
of the terms in the above product involve xi? |
|
How would you describe the variables
which appear in those terms? (see example) |
|
These neighboring variables are called
Xi's Markov blanket. |
Markov Blanket (cont.)
|
|
|
|
Since all other terms in the product
(from CPTs other than that of Xi and its children) do not include Xi, |
|
they are constant relative to Xi, and |
|
they can be replaced by a normalizing
factor. |
|
(Proof?) |
|
(Worked example) |
Joint Probability
Computation Example
Utility Theory
Utility of Money
Decision Networks
Value of Information