CS 371: Introduction to
Artificial Intelligence
Natural Language Communication |
Bigger Issues |
Natural Language Communication |
speech formulation, speech recognition,
lexical analysis, parsing, disambiguation, discourse understanding |
Bigger Issues |
responsibility in AI, utility functions
and the value of human life, neo-Luddism, knowledge as power and intellectual
capital, machines emulating people, artificial societies |
Agent Communication
“intentional exchange of information
brought about by the production and perception of signs drawn from a shared
system of conventional signs” |
Purposes: control only? No.: |
Inform, query, answer, request/command
action, promise/bargain, acknowledge, share experiences, etc. |
Speech acts - direct: Help me!,
indirect: I could use some help. |
Natural Language
Unlike machine language, natural
language is |
ambiguous at many levels |
much more dynamic - anyone want to do
version control? |
Fuzzy, approximate |
relies heavily on understanding of
implicit communication, common sense knowledge |
etc. |
Stages of Communication
Speaker: |
intention |
what to say when |
result of planning, decision analysis,
and other thought/feeling processes |
hearer’s recognizing and understanding
intention requires similar processes |
generation - choosing words |
synthesis - uttering words |
Stages of Communication
Hearer: |
perception - hearing words (could be
mistaken) |
analysis - infer possible meanings |
disambiguation - pick most likely
meaning |
incorporation - decide what to do with
it |
Speech Recognition:
language model
mapping sound waves to a sequence of
words |
“It’s hard to wreck a nice beach.” |
Probabilistic Context-Free Grammars
(PCFGs) |
P(w1 w2 … wN) = P(w1) * P(w2|w1) * … *
P(wN|w1 … w(N-1)) |
CPTs too large: |
unigrams: Approximate as P(w1) * P(w2)
* … * P(wN) |
bigrams: Approximate as P(w1) *
P(w2|w1) * … * P(wN|w(N-1)) |
trigrams:Approximate as P(w1) *
P(w2|w1) * P(w3|w1,w2)… * P(wN|w(N-1),w(N-2)) |
Unigram and Bigram Counts
PCFG Approximations
Tradeoff between: |
Context sensitivity: |
"I has", "man have"
– subject-verb agreement |
"I, for one, has…" "man
over there have" |
Memory, acquisition of sufficient
training examples |
Compromise: weighted sum of unigram,
bigram, and trigram models |
Speech Recognition:
acoustic model
Question #1: What speech sounds did the
speaker utter? P(signal|words) |
Human speech has 40-50 sounds called
phones |
characterized by features in acoustic
signal (e.g. frequency, amplitude, duration, etc.) |
application of machine learning |
DARPA Phonetic Alphabet
Speech Recognition
Question #2: What words did the speaker
intend to express with those sounds? P(words|signal) |
“It’s not a porch. It’s a …” |
homophones (e.g. “0+2=2. One and one
sum to two too.”) |
noise (focusing amidst multiple
conversations) |
segmentation (Three string walk into a
bar…) |
dialects (tow-may-tow, tow-mah-tow) |
coarticulation (tah-may-tow,
tow-may-tow) |
Speech Quantization
Dialect and
Bayesian Approach
Assume a language model P(words) |
Want P(words|signal). |
If we had P(signal|words), we could
compute the words that maximize P(words|signal). How? |
If the signal gave us a list of phones,
we could do this, but we can't. |
The best we can do at this point is to
compute P(words|phones). Then we need
P(phones|signal). |
For this, a hidden Markov model (HMM)
is used. |
Speech Recognition
Approach: Hidden Markov Models (HMMs) |
"Hidden" – true state hidden
from observer |
Any number of states can generate a
given symbol |
The probability that a sequence came
from the [m] model is the sum over all paths of |
the probability of the path, times |
the probability that the path generated
the sequence. |
Putting it Together
Three models |
language bigram à P(word(i)|word(i-1)) |
word pronunciation HMM à
P(phones|word) |
phone HMM à P(signal|phone) |
To compute P(words|signal), these need
to be combined. |
One big HMM – make language bigram into
an HMM and construct a large HMM by nesting each level of abstraction |
Viterbi Algorithm: an
instance of dynamic programming
mapping a sequence of words to possible
interpretations |
“Time flies like an arrow. Fruit flies like a banana.” - Groucho Marx |
list of tokens Þ annotated
parse tree |
Example Grammar
Propositional Logic: |
Sentence ® Proposition | Complex
Sentence |
Proposition ® P | Q | R | … |
Complex Sentence ® (Sentence) | Ø Sentence |
Sentence Connective Sentence |
Connective ® Ù | Ú | Þ | Û |
Ambiguity not resolved by
parentheses resolved by precedence rules |
Augmented Definite Clause
“Johanna baked cookies.” |
S(func(obj)) ® NP(obj) VP(func) |
VP(func(obj)) ® Verb(func) NP(obj) |
NP(obj) ® Name(obj) | Noun(obj) |
Name(Johanna) ® Johanna |
Verb(ly lx Baked(x,y)) ® baked |
Noun(cookies) ® cookies |
Syntactic evidence: “Lee asked Kim to
tell Toby to leave on Saturday.” |
Lexical evidence: “Lee placed the dress
on the rack. Kim wanted the dress on
the rack.” |
Semantic evidence: ball, diamond, bat,
base |
I ate spaghetti with {meatballs, salad,
abandon, a fork, a friend}. |
Disambiguation (cont.)
Metonymy - one object stands for
another: |
I drive a Geo. |
The University frowns on squirrel
chasing. |
Metaphor - “Prices are high. Stocks
dropped.” |
Note: We’ve thrown out important
information! Inflection differentiates: |
“Do you know what day this is?” “No.” |
“There’s another quiz today.” “No!” |
“I’m not ready for it.” “No?” |
Discourse Understanding
John went to a fancy restaurant. |
He was pleased and gave the waiter a
big tip. |
He spent $50. |
Did the waiter or John spend $50? |
Did the $50 include the tip? |
What was John pleased with? |
Why did he give the waiter a big tip? |
It’s not quite a UNIX
pipe problem.
Understanding/context informs parsing. |
Parsing informs speech recognition. |
Spoken questions are used to
disambiguate. |
As we learn how the brain processes
speech, we’ll learn better architectures for natural language processing. |
Agent 1 heard Agent 2 say “The sky is
falling!” |
Agent 1 heard Agent 2 say that Agent 3
said “The sky is falling!” |
Master-slave agent relationship: |
Ben: "You don't need to see his
identification." |
Trooper: "We don't need to see his
identification." |
Ben: "These are not the droids
your looking for." |
Trooper: "These are not the droids
we're looking for." |
Ben: "He can go about his
business." |
Trooper: "You can go about your
business." |
Ben: "Move along." |
Trooper: "Move along. Move
along." |
Speech Recognition and
Motivation for
Intelligent Agents
Automation - people expensive, machines
cheap |
When $$$ is all that matters, why not
automate everything that saves a buck? |
Industrial Age : Luddites ::
Information Age : Neo-Luddites |
Global competition, survival of the
fittest, job specialization, automation |
What about job satisfaction? |
Responsibility and AI
When software does the wrong thing |
Unintentional, accidental - “bug” |
What of intentional wrong behavior? |
Utility/heuristic functions as an
extension of an AI developer’s will |
Where do you draw the line? |
The Micromort
See R&N pp. 479-480 |
What if you’re coding the value of a
micromort? |
Machines Emulating People
Rodney Brooks wants to emulate people
with robots |
Businesses automating transactions |
A consumer society without faces |
Why create virtual reality? |
Artificial Agent
Artificial agents interact, form
artificial societies |
Computational resource sharing |
Game theory assumes opportunism |
Think Different! altruism, cooperation |
Our programming (like our speech and
actions) is a reflection of who we
are and what we value. Value other
people. |