CS 371: Introduction to Artificial Intelligence
Natural Language Communication
Bigger Issues

Outline
Natural Language Communication
speech formulation, speech recognition, lexical analysis, parsing, disambiguation, discourse understanding
Bigger Issues
responsibility in AI, utility functions and the value of human life, neo-Luddism, knowledge as power and intellectual capital, machines emulating people, artificial societies

Agent Communication
“intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs”
Purposes: control only? No.:
Inform, query, answer, request/command action, promise/bargain, acknowledge, share experiences, etc.
Speech acts - direct: Help me!, indirect: I could use some help.

Natural Language
Unlike machine language, natural language is
ambiguous at many levels
much more dynamic - anyone want to do version control?
Fuzzy, approximate
relies heavily on understanding of implicit communication, common sense knowledge
etc.

Stages of Communication
Speaker:
intention
what to say when
result of planning, decision analysis, and other thought/feeling processes
hearer’s recognizing and understanding intention requires similar processes
generation - choosing words
synthesis - uttering words

Stages of Communication (cont.)
Hearer:
perception - hearing words (could be mistaken)
analysis - infer possible meanings
disambiguation - pick most likely meaning
incorporation - decide what to do with it

Speech Recognition: language model
mapping sound waves to a sequence of words
“It’s hard to wreck a nice beach.”
Probabilistic Context-Free Grammars (PCFGs)
P(w1 w2 … wN) = P(w1) * P(w2|w1) * … * P(wN|w1 … w(N-1))
CPTs too large:
unigrams: Approximate as P(w1) * P(w2) * … * P(wN)
bigrams: Approximate as P(w1) * P(w2|w1) * … * P(wN|w(N-1))
trigrams:Approximate as P(w1) * P(w2|w1) * P(w3|w1,w2)… * P(wN|w(N-1),w(N-2))

Unigram and Bigram Counts

PCFG Approximations
Tradeoff between:
Context sensitivity:
"I has", "man have" – subject-verb agreement
"I, for one, has…" "man over there have"
Memory, acquisition of sufficient training examples
Compromise: weighted sum of unigram, bigram, and trigram models

Speech Recognition: acoustic model
Question #1: What speech sounds did the speaker utter? P(signal|words)
Human speech has 40-50 sounds called phones
characterized by features in acoustic signal (e.g. frequency, amplitude, duration, etc.)
application of machine learning

DARPA Phonetic Alphabet

Speech Recognition
Question #2: What words did the speaker intend to express with those sounds? P(words|signal)
“It’s not a porch.  It’s a …”
homophones (e.g. “0+2=2. One and one sum to two too.”)
noise (focusing amidst multiple conversations)
segmentation (Three string walk into a bar…)
dialects (tow-may-tow, tow-mah-tow)
coarticulation (tah-may-tow, tow-may-tow)

Speech Quantization

Dialect and Coarticulation

Bayesian Approach
Assume a language model P(words)
Want P(words|signal).
If we had P(signal|words), we could compute the words that maximize P(words|signal).  How?
If the signal gave us a list of phones, we could do this, but we can't.
The best we can do at this point is to compute P(words|phones).  Then we need P(phones|signal).
For this, a hidden Markov model (HMM) is used.

Speech Recognition (cont.)
Approach: Hidden Markov Models (HMMs)
"Hidden" – true state hidden from observer
Any number of states can generate a given symbol
The probability that a sequence came from the [m] model is the sum over all paths of
the probability of the path, times
the probability that the path generated the sequence.

Putting it Together
Three models
language bigram à P(word(i)|word(i-1))
word pronunciation HMM à P(phones|word)
phone HMM à P(signal|phone)
To compute P(words|signal), these need to be combined.
One big HMM – make language bigram into an HMM and construct a large HMM by nesting each level of abstraction

Viterbi Algorithm: an instance of dynamic programming

Parsing
mapping a sequence of words to possible interpretations
“Time flies like an arrow.  Fruit flies like a banana.” - Groucho Marx
list of tokens Þ annotated parse tree

Example Grammar
Propositional Logic:
Sentence ® Proposition | Complex Sentence
Proposition ® P | Q | R | …
Complex Sentence ® (Sentence) | Ø Sentence | Sentence Connective Sentence
Connective ® Ù | Ú | Þ | Û
Ambiguity not resolved by parentheses resolved by precedence rules

Augmented Definite Clause Grammar
“Johanna baked cookies.”
S(func(obj)) ® NP(obj) VP(func)
VP(func(obj)) ® Verb(func) NP(obj)
NP(obj) ® Name(obj) | Noun(obj)
Name(Johanna) ® Johanna
Verb(ly lx Baked(x,y)) ® baked
Noun(cookies) ® cookies

Disambiguation
Syntactic evidence: “Lee asked Kim to tell Toby to leave on Saturday.”
Lexical evidence: “Lee placed the dress on the rack.  Kim wanted the dress on the rack.”
Semantic evidence: ball, diamond, bat, base
I ate spaghetti with {meatballs, salad, abandon, a fork, a friend}.

Disambiguation (cont.)
Metonymy - one object stands for another:
I drive a Geo.
The University frowns on squirrel chasing.
Metaphor - “Prices are high. Stocks dropped.”
Note: We’ve thrown out important information! Inflection differentiates:
“Do you know what day this is?” “No.”
“There’s another quiz today.” “No!”
“I’m not ready for it.” “No?”

Discourse Understanding
John went to a fancy restaurant.
He was pleased and gave the waiter a big tip.
He spent $50.
Did the waiter or John spend $50?
Did the $50 include the tip?
What was John pleased with?
Why did he give the waiter a big tip?

It’s not quite a UNIX pipe problem.
Understanding/context informs parsing.
Parsing informs speech recognition.
Spoken questions are used to disambiguate.
As we learn how the brain processes speech, we’ll learn better architectures for natural language processing.

Trust
Agent 1 heard Agent 2 say “The sky is falling!”
Agent 1 heard Agent 2 say that Agent 3 said “The sky is falling!”
Master-slave agent relationship:
Ben: "You don't need to see his identification."
Trooper: "We don't need to see his identification."
Ben: "These are not the droids your looking for."
Trooper: "These are not the droids we're looking for."
Ben: "He can go about his business."
Trooper: "You can go about your business."
Ben: "Move along."
Trooper: "Move along. Move along."

Speech Recognition and Society
Newspaper article

Motivation for Intelligent Agents
Automation - people expensive, machines cheap
When $$$ is all that matters, why not automate everything that saves a buck?
Industrial Age : Luddites :: Information Age : Neo-Luddites
Global competition, survival of the fittest, job specialization, automation
What about job satisfaction?

Responsibility and AI
When software does the wrong thing
Unintentional, accidental - “bug”
What of intentional wrong behavior?
Utility/heuristic functions as an extension of an AI developer’s will
Where do you draw the line?

The Micromort
See R&N pp. 479-480
What if you’re coding the value of a micromort?

Machines Emulating People
Rodney Brooks wants to emulate people with robots
Businesses automating transactions
A consumer society without faces
Why create virtual reality?

Artificial Agent Societies
Artificial agents interact, form artificial societies
Computational resource sharing
Game theory assumes opportunism
Think Different! altruism, cooperation
Our programming (like our speech and actions)  is a reflection of who we are and what we value.  Value other people.