|
|
|
|
|
Qualification problem in FOL |
|
e.g. car doesn’t start |
|
Problems: |
|
no exceptions? Þ big rules! |
|
no knowledge of likelihood of each exceptions |
|
no complete theory (e.g. medical science) |
|
even given complete rules, sometime we only have
partial evidence |
|
|
|
|
|
Possible solution: probabilities |
|
summarizes uncertainty |
|
gives likelihood information, incomplete theory
can be refined, can handle partial evidence, but… |
|
rules can still be big Þ stay tuned for
simplifying assumptions |
|
How might an agent use probabilities to choose
actions given percepts? |
|
|
|
|
|
Iterate: |
|
Update evidence with current percept |
|
Compute outcome probabilities for actions |
|
Select action with maximum expected utility
given probable outcomes |
|
utility - the quality of being useful |
|
decision theory = probability theory + utility
theory |
|
|
|
|
unconditional or prior probabilities –
probabilities without prior information (i.e. before evidence). |
|
P(A) is the probability of A in the absence of
other information. |
|
Suppose we have a discrete random variable
Weather that can take on 4 values: Sunny, Rainy, Cloudy, or Snowy. |
|
How to form prior probabilities? |
|
|
|
|
|
In absence of any information at all, we might
say all outcomes are equally likely. |
|
Better, however to apply some knowledge to
choice of prior probabilities (e.g. weather statistics over many years). |
|
P(Weather) = <0.7, 0.2, 0.08, 0.02> |
|
(probability distribution over random variable
Weather) |
|
What about low probability events that have
never happened or happen too infrequently to have accurate statistics? |
|
|
|
|
Frequentist view – probabilities from
experimentation |
|
Objectivist view – probabilities real values
frequentists approximate |
|
Subjectivist view – probabilities reflect agent
degrees of belief |
|
|
|
|
conditional or posterior probabilities –
probabilities with prior information (i.e. after evidence) |
|
P(A|B) is the probability of A given that all we
know is B. |
|
P(Weather=Rainy|Month=April) |
|
Is P(BÞA) equal to P(A|B)? |
|
Product Rule: P(AÙB) = P(A|B)P(B) |
|
|
|
|
All probabilities are between 0 and 1. |
|
Necessarily true and false propositions have
probability 1 and 0, respectively. |
|
The probability of a disjunction is given by P(AÚB) = P(A) +
P(B) - P(AÙB) |
|
From these three axioms, all other properties of
probabilities can be derived. |
|
|
|
|
de Finetti’s betting argument: Put your money
where your beliefs are. |
|
If agent 1 has a set of beliefs inconsistent
with the axioms of probability, then there exists a betting strategy for
agent 2 that guarantees that agent 1 will lose money. |
|
practical results have made an even more
persuasive arguments (e.g. Pathfinder medical diagnosis) |
|
|
|
|
Atomic event - an assignment of values to
variable; a specific state of the world |
|
For simplicity, we'll treat all variables as
Boolean (e.g. P(A), P(ØA), P(A^B)) |
|
Joint probability P(X1,X2,…,Xn) - a function
mapping atomic events to probabilities for atomic events |
|
|
|
|
What's the probability of having a cavity given
the evidence of a toothache? |
|
Like a lookup table for probabilities: can
easily have too many entries for practical entry Þ motivation for
conditional probabilities |
|
|
|
|
|
Bayes’ Rule underlies all modern AI systems for
probabilistic inference |
|
two forms of product rule: |
|
P(AÙB) = |
|
P(AÙB) = |
|
Now use these two to form an equation for: |
|
P(B|A) = |
|
|
|
|
|
Bayes’ Rule underlies all modern AI systems for
probabilistic inference |
|
two forms of product rule: |
|
P(AÙB) = P(A|B) P(B) |
|
P(AÙB) = P(B|A) P(A) |
|
Now use these two to form an equation for: |
|
P(B|A) = P(A|B) P(B) / P(A) |
|
|
|
|
|
What's Bayes' Rule good for? Need three terms to compute one! |
|
Often you only have the three and need the
fourth. |
|
Example: |
|
M = patient has meningitis |
|
S = patient has stiff neck |
|
|
|
|
|
Given: |
|
P(S|M) = 0.5 |
|
P(M) = 1/50000 |
|
P(S) = 1/20 |
|
What's the probability that a patient with a
stiff neck has meningitis? |
|
|
|
|
|
Given: |
|
P(S|M) = 0.5 |
|
P(M) = 1/50000 |
|
P(S) = 1/20 |
|
What's the probability that a patient with a
stiff neck has meningitis? |
|
P(M|S) = P(S|M) P(M) / P(S) |
|
= 0.5 * (1/50000) / (1/20) |
|
= 0.5 * 20 / 50000 = 10/50000 = 1/5000 |
|
|
|
|
|
Now suppose we don't know the probability of a
stiff neck, but we do know: |
|
the probability of whiplash P(W) = (1/1000) |
|
the probability of a stiff neck given whiplash
P(S|W) = 0.8 |
|
What is the relative likelihood of meningitis
and whiplash given a stiff neck? |
|
Write Bayes' Rule for each and write
P(M|S)/P(W|S) |
|
|
|
|
|
Now suppose we don't know the probability of a
stiff neck, but we do know: |
|
the probability of whiplash P(W) = (1/1000) |
|
the probability of a stiff neck given whiplash
P(S|W) = 0.8 |
|
What is the relative likelihood of meningitis
and whiplash given a stiff neck? |
|
Write Bayes' Rule for each and write
P(M|S)/P(W|S) |
|
P(M|S)/P(W|S) = (P(S|M)P(M)/P(S)) /
(P(S|W)P(W)/P(S)) |
|
= (P(S|M) P(M))/(P(S|W) P(W)) |
|
= (0.5*(1/50000))/(0.8*(1/1000)) |
|
= .00001 / .0008 = 1/80 |
|
|
|
|
Write Bayes' Rule for P(M|S) |
|
Now write Bayes' Rule for P(ØM|S) |
|
We know P(M|S) + P(ØM|S) = 1 |
|
Use these to write a new expression for P(S) |
|
Substitute this expression in Bayes' Rule for
P(M|S) |
|
One does not need P(S) directly. |
|
|
|
|
|
The main point however, is that 1/P(S) is a normalizing
constant that allows conditional terms to sum to one. |
|
P(M|S) = a P(S|M) P(M) |
|
where a = 1/P(S) is a normalizing constant such that
P(M|S) + P(ØM|S) = 1 |
|
|
|
|
|
What's the probability of my having a cavity
given that I stubbed my toe? |
|
Often, there is no direct causal link between
two things: |
|
direct:
burglary à alarm cavity à toothache |
|
disease à symptom defect à failure |
|
indirect: burglary à alarm company calls |
|
cavity à dentist called about
toothache |
|
disease à symptom noted |
|
defect à failure caused by
failure |
|
|
|
|
The size of a table for a joint probability
distribution can easily become enormous (exponential in number of
variables). |
|
How can one represent a joint probability
distribution more compactly? |
|
|
|
|
|
Assume variables are conditionally independent
by default. |
|
Only represent direct causal links (conditional
dependence) between random variables. |
|
Belief network or Bayesian network: |
|
set of random variables (nodes) |
|
set of directed links (edges) indicating direct
influence of one variable on another. |
|
a table for each variable, supplying conditional
probabilities of the variable for each assignment of its parents |
|
no directed cycles (network is a DAG) |
|
|
|
|
From Cooper [1984]:
"Metastatic cancer is a possible cause of a brain tumor and is also
an explanation for increased total serum calcium. In turn, either of these could explain a patient falling into
a coma. Severe headache is also
possible associated with a brain tumor." |
|
What are our variables? |
|
What are the direct causal influences between
them? |
|
|
|
|
|
Let: |
|
A = Patient has metastatic cancer |
|
B = Patient has increased total serum calcium |
|
C = Patient has a brain tumor |
|
D = Patient lapses occasionally into coma |
|
E = Patient has a severe headache |
|
What are the direct causal links between these
variables?
"Metastatic cancer is a possible cause of a brain
tumor and is also an explanation for increased total serum calcium. In turn, either of these could explain a
patient falling into a coma. Severe
headache is also possible associated with a brain tumor." |
|
Draw the belief net. |
|
|
|
|
|
From the joint probability distribution, we can
answer any probability query. |
|
From the conditional (in)dependence assumptions
and CPTs of the belief network, we can compute the joint probability
distribution. |
|
Therefore, a belief network has the
probabilistic information to answer any probability query. |
|
How do we compute the joint probability
distribution from the belief network? |
|
|
|
|
Denote our set of variables as X1, X2, …, Xn. |
|
The joint probability distribution P(X1,…,Xn)
can be thought of as a table with entries P(X1=x1,…,Xn=xn) or simply P(x1,
…, xn) where x1,…,xn is a possible assignment to all variables. |
|
Using CPTs,
P(x1, …, xn) =
P(x1|ParentValues(x1)) * … *
P(xn|ParentValues(xn)) |
|
|
|
|
|
Suppose we want to know the probability of each
variable's values given all other variable values. |
|
Recall P(x1, …, xn) =
P(x1|ParentValues(x1)) * … *
P(xn|ParentValues(xn)) |
|
In computing P(x1, …, xi, …, xn), which of the
terms in the above product involve xi? |
|
How would you describe the variables which
appear in those terms? (see example) |
|
These neighboring variables are called Xi's Markov
blanket. |
|
|
|
|
|
Since all other terms in the product (from CPTs
other than that of Xi and its children) do not include Xi, |
|
they are constant relative to Xi, and |
|
they can be replaced by a normalizing factor. |
|
(Proof?) |
|
(Worked example) |
|
|
|
|
|
|