CS 371: Introduction to
Artificial Intelligence
|
|
|
Natural Language Communication |
|
Bigger Issues |
Outline
|
|
|
|
Natural Language Communication |
|
speech formulation, speech recognition,
lexical analysis, parsing, disambiguation, discourse understanding |
|
Bigger Issues |
|
responsibility in AI, utility functions
and the value of human life, neo-Luddism, knowledge as power and intellectual
capital, machines emulating people, artificial societies |
Agent Communication
|
|
|
|
“intentional exchange of information
brought about by the production and perception of signs drawn from a shared
system of conventional signs” |
|
Purposes: control only? No.: |
|
Inform, query, answer, request/command
action, promise/bargain, acknowledge, share experiences, etc. |
|
Speech acts - direct: Help me!,
indirect: I could use some help. |
Natural Language
|
|
|
|
Unlike machine language, natural
language is |
|
ambiguous at many levels |
|
much more dynamic - anyone want to do
version control? |
|
Fuzzy, approximate |
|
relies heavily on understanding of
implicit communication, common sense knowledge |
|
etc. |
Stages of Communication
|
|
|
|
|
Speaker: |
|
intention |
|
what to say when |
|
result of planning, decision analysis,
and other thought/feeling processes |
|
hearer’s recognizing and understanding
intention requires similar processes |
|
generation - choosing words |
|
synthesis - uttering words |
Stages of Communication
(cont.)
|
|
|
|
Hearer: |
|
perception - hearing words (could be
mistaken) |
|
analysis - infer possible meanings |
|
disambiguation - pick most likely
meaning |
|
incorporation - decide what to do with
it |
Speech Recognition:
language model
|
|
|
|
mapping sound waves to a sequence of
words |
|
“It’s hard to wreck a nice beach.” |
|
Probabilistic Context-Free Grammars
(PCFGs) |
|
P(w1 w2 … wN) = P(w1) * P(w2|w1) * … *
P(wN|w1 … w(N-1)) |
|
CPTs too large: |
|
unigrams: Approximate as P(w1) * P(w2)
* … * P(wN) |
|
bigrams: Approximate as P(w1) *
P(w2|w1) * … * P(wN|w(N-1)) |
|
trigrams:Approximate as P(w1) *
P(w2|w1) * P(w3|w1,w2)… * P(wN|w(N-1),w(N-2)) |
Unigram and Bigram Counts
PCFG Approximations
|
|
|
|
|
Tradeoff between: |
|
Context sensitivity: |
|
"I has", "man have"
– subject-verb agreement |
|
"I, for one, has…" "man
over there have" |
|
Memory, acquisition of sufficient
training examples |
|
Compromise: weighted sum of unigram,
bigram, and trigram models |
Speech Recognition:
acoustic model
|
|
|
|
Question #1: What speech sounds did the
speaker utter? P(signal|words) |
|
Human speech has 40-50 sounds called
phones |
|
characterized by features in acoustic
signal (e.g. frequency, amplitude, duration, etc.) |
|
application of machine learning |
DARPA Phonetic Alphabet
Speech Recognition
|
|
|
|
Question #2: What words did the speaker
intend to express with those sounds? P(words|signal) |
|
“It’s not a porch. It’s a …” |
|
homophones (e.g. “0+2=2. One and one
sum to two too.”) |
|
noise (focusing amidst multiple
conversations) |
|
segmentation (Three string walk into a
bar…) |
|
dialects (tow-may-tow, tow-mah-tow) |
|
coarticulation (tah-may-tow,
tow-may-tow) |
Speech Quantization
Dialect and
Coarticulation
Bayesian Approach
|
|
|
Assume a language model P(words) |
|
Want P(words|signal). |
|
If we had P(signal|words), we could
compute the words that maximize P(words|signal). How? |
|
If the signal gave us a list of phones,
we could do this, but we can't. |
|
The best we can do at this point is to
compute P(words|phones). Then we need
P(phones|signal). |
|
For this, a hidden Markov model (HMM)
is used. |
Speech Recognition
(cont.)
|
|
|
|
Approach: Hidden Markov Models (HMMs) |
|
"Hidden" – true state hidden
from observer |
|
Any number of states can generate a
given symbol |
|
The probability that a sequence came
from the [m] model is the sum over all paths of |
|
the probability of the path, times |
|
the probability that the path generated
the sequence. |
Putting it Together
|
|
|
|
Three models |
|
language bigram à P(word(i)|word(i-1)) |
|
word pronunciation HMM à
P(phones|word) |
|
phone HMM à P(signal|phone) |
|
To compute P(words|signal), these need
to be combined. |
|
One big HMM – make language bigram into
an HMM and construct a large HMM by nesting each level of abstraction |
Viterbi Algorithm: an
instance of dynamic programming
Parsing
|
|
|
mapping a sequence of words to possible
interpretations |
|
“Time flies like an arrow. Fruit flies like a banana.” - Groucho Marx |
|
list of tokens Þ annotated
parse tree |
Example Grammar
|
|
|
|
Propositional Logic: |
|
Sentence ® Proposition | Complex
Sentence |
|
Proposition ® P | Q | R | … |
|
Complex Sentence ® (Sentence) | Ø Sentence |
Sentence Connective Sentence |
|
Connective ® Ù | Ú | Þ | Û |
|
Ambiguity not resolved by
parentheses resolved by precedence rules |
Augmented Definite Clause
Grammar
|
|
|
“Johanna baked cookies.” |
|
S(func(obj)) ® NP(obj) VP(func) |
|
VP(func(obj)) ® Verb(func) NP(obj) |
|
NP(obj) ® Name(obj) | Noun(obj) |
|
Name(Johanna) ® Johanna |
|
Verb(ly lx Baked(x,y)) ® baked |
|
Noun(cookies) ® cookies |
Disambiguation
|
|
|
|
Syntactic evidence: “Lee asked Kim to
tell Toby to leave on Saturday.” |
|
Lexical evidence: “Lee placed the dress
on the rack. Kim wanted the dress on
the rack.” |
|
Semantic evidence: ball, diamond, bat,
base |
|
I ate spaghetti with {meatballs, salad,
abandon, a fork, a friend}. |
Disambiguation (cont.)
|
|
|
|
Metonymy - one object stands for
another: |
|
I drive a Geo. |
|
The University frowns on squirrel
chasing. |
|
Metaphor - “Prices are high. Stocks
dropped.” |
|
Note: We’ve thrown out important
information! Inflection differentiates: |
|
“Do you know what day this is?” “No.” |
|
“There’s another quiz today.” “No!” |
|
“I’m not ready for it.” “No?” |
Discourse Understanding
|
|
|
John went to a fancy restaurant. |
|
He was pleased and gave the waiter a
big tip. |
|
He spent $50. |
|
|
|
Did the waiter or John spend $50? |
|
Did the $50 include the tip? |
|
What was John pleased with? |
|
Why did he give the waiter a big tip? |
It’s not quite a UNIX
pipe problem.
|
|
|
Understanding/context informs parsing. |
|
Parsing informs speech recognition. |
|
Spoken questions are used to
disambiguate. |
|
As we learn how the brain processes
speech, we’ll learn better architectures for natural language processing. |
Trust
|
|
|
|
Agent 1 heard Agent 2 say “The sky is
falling!” |
|
Agent 1 heard Agent 2 say that Agent 3
said “The sky is falling!” |
|
Master-slave agent relationship: |
|
Ben: "You don't need to see his
identification." |
|
Trooper: "We don't need to see his
identification." |
|
Ben: "These are not the droids
your looking for." |
|
Trooper: "These are not the droids
we're looking for." |
|
Ben: "He can go about his
business." |
|
Trooper: "You can go about your
business." |
|
Ben: "Move along." |
|
Trooper: "Move along. Move
along." |
Speech Recognition and
Society
Motivation for
Intelligent Agents
|
|
|
Automation - people expensive, machines
cheap |
|
When $$$ is all that matters, why not
automate everything that saves a buck? |
|
Industrial Age : Luddites ::
Information Age : Neo-Luddites |
|
Global competition, survival of the
fittest, job specialization, automation |
|
What about job satisfaction? |
Responsibility and AI
|
|
|
When software does the wrong thing |
|
Unintentional, accidental - “bug” |
|
What of intentional wrong behavior? |
|
Utility/heuristic functions as an
extension of an AI developer’s will |
|
Where do you draw the line? |
The Micromort
|
|
|
See R&N pp. 479-480 |
|
What if you’re coding the value of a
micromort? |
Machines Emulating People
|
|
|
Rodney Brooks wants to emulate people
with robots |
|
Businesses automating transactions |
|
A consumer society without faces |
|
Why create virtual reality? |
Artificial Agent
Societies
|
|
|
Artificial agents interact, form
artificial societies |
|
Computational resource sharing |
|
Game theory assumes opportunism |
|
Think Different! altruism, cooperation |
|
Our programming (like our speech and
actions) is a reflection of who we
are and what we value. Value other
people. |
|
|