CS 371: Introduction to Artificial Intelligence

Natural Language Communication

Bigger Issues

Outline

Natural Language Communication

speech formulation, speech recognition, lexical analysis, parsing, disambiguation, discourse understanding

Bigger Issues

responsibility in AI, utility functions and the value of human life, neo-Luddism, knowledge as power and intellectual capital, machines emulating people, artificial societies

Agent Communication

“intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs”

Purposes: control only? No.:

Inform, query, answer, request/command action, promise/bargain, acknowledge, share experiences, etc.

Speech acts - direct: Help me!, indirect: I could use some help.

Natural Language

Unlike machine language, natural language is

ambiguous at many levels

much more dynamic - anyone want to do version control?

Fuzzy, approximate

relies heavily on understanding of implicit communication, common sense knowledge

etc.

Stages of Communication

Speaker:

intention

what to say when

result of planning, decision analysis, and other thought/feeling processes

hearer’s recognizing and understanding intention requires similar processes

generation - choosing words

synthesis - uttering words

Stages of Communication (cont.)

Hearer:

perception - hearing words (could be mistaken)

analysis - infer possible meanings

disambiguation - pick most likely meaning

incorporation - decide what to do with it

Speech Recognition: language model

mapping sound waves to a sequence of words

“It’s hard to wreck a nice beach.”

Probabilistic Context-Free Grammars (PCFGs)

P(w1 w2 … wN) = P(w1) * P(w2|w1) * … * P(wN|w1 … w(N-1))

CPTs too large:

unigrams: Approximate as P(w1) * P(w2) * … * P(wN)

bigrams: Approximate as P(w1) * P(w2|w1) * … * P(wN|w(N-1))

trigrams:Approximate as P(w1) * P(w2|w1) * P(w3|w1,w2)… * P(wN|w(N-1),w(N-2))

Unigram and Bigram Counts

PCFG Approximations

Tradeoff between:

Context sensitivity:

"I has", "man have" – subject-verb agreement

"I, for one, has…" "man over there have"

Memory, acquisition of sufficient training examples

Compromise: weighted sum of unigram, bigram, and trigram models

Speech Recognition: acoustic model

Question #1: What speech sounds did the speaker utter? P(signal|words)

Human speech has 40-50 sounds called phones

characterized by features in acoustic signal (e.g. frequency, amplitude, duration, etc.)

application of machine learning

DARPA Phonetic Alphabet

Speech Recognition

Question #2: What words did the speaker intend to express with those sounds? P(words|signal)

“It’s not a porch. It’s a …”

homophones (e.g. “0+2=2. One and one sum to two too.”)

noise (focusing amidst multiple conversations)

segmentation (Three string walk into a bar…)

dialects (tow-may-tow, tow-mah-tow)

coarticulation (tah-may-tow, tow-may-tow)

Speech Quantization

Dialect and Coarticulation

Bayesian Approach

Assume a language model P(words)

Want P(words|signal).

If we had P(signal|words), we could compute the words that maximize P(words|signal). How?

If the signal gave us a list of phones, we could do this, but we can't.

The best we can do at this point is to compute P(words|phones). Then we need P(phones|signal).

For this, a hidden Markov model (HMM) is used.

Speech Recognition (cont.)

Approach: Hidden Markov Models (HMMs)

"Hidden" – true state hidden from observer

Any number of states can generate a given symbol

The probability that a sequence came from the [m] model is the sum over all paths of

the probability of the path, times

the probability that the path generated the sequence.

Putting it Together

Three models

language bigram à P(word(i)|word(i-1))

word pronunciation HMM à P(phones|word)

phone HMM à P(signal|phone)

To compute P(words|signal), these need to be combined.

One big HMM – make language bigram into an HMM and construct a large HMM by nesting each level of abstraction

Viterbi Algorithm: an instance of dynamic programming

Parsing

mapping a sequence of words to possible interpretations

“Time flies like an arrow. Fruit flies like a banana.” - Groucho Marx

list of tokens Þ annotated parse tree

Augmented Definite Clause Grammar

“Johanna baked cookies.”

S(func(obj)) ® NP(obj) VP(func)

VP(func(obj)) ® Verb(func) NP(obj)

NP(obj) ® Name(obj) | Noun(obj)

Name(Johanna) ® Johanna

Verb(ly lx Baked(x,y)) ® baked

Noun(cookies) ® cookies

Disambiguation

Syntactic evidence: “Lee asked Kim to tell Toby to leave on Saturday.”

Lexical evidence: “Lee placed the dress on the rack. Kim wanted the dress on the rack.”

Semantic evidence: ball, diamond, bat, base

I ate spaghetti with {meatballs, salad, abandon, a fork, a friend}.

Disambiguation (cont.)

Metonymy - one object stands for another:

I drive a Geo.

The University frowns on squirrel chasing.

Metaphor - “Prices are high. Stocks dropped.”

Note: We’ve thrown out important information! Inflection differentiates:

“Do you know what day this is?” “No.”

“There’s another quiz today.” “No!”

“I’m not ready for it.” “No?”

Discourse Understanding

John went to a fancy restaurant.

He was pleased and gave the waiter a big tip.

He spent $50.

Did the waiter or John spend $50?

Did the $50 include the tip?

What was John pleased with?

Why did he give the waiter a big tip?

It’s not quite a UNIX pipe problem.

Understanding/context informs parsing.

Parsing informs speech recognition.

Spoken questions are used to disambiguate.

As we learn how the brain processes speech, we’ll learn better architectures for natural language processing.

Trust

Agent 1 heard Agent 2 say “The sky is falling!”

Agent 1 heard Agent 2 say that Agent 3 said “The sky is falling!”

Master-slave agent relationship:

Ben: "You don't need to see his identification."

Trooper: "We don't need to see his identification."

Ben: "These are not the droids your looking for."

Trooper: "These are not the droids we're looking for."

Ben: "He can go about his business."

Trooper: "You can go about your business."

Ben: "Move along."

Trooper: "Move along. Move along."

Speech Recognition and Society

Newspaper article

Motivation for Intelligent Agents

Automation - people expensive, machines cheap

When $$$ is all that matters, why not automate everything that saves a buck?

Industrial Age : Luddites :: Information Age : Neo-Luddites

Global competition, survival of the fittest, job specialization, automation

What about job satisfaction?

Responsibility and AI

When software does the wrong thing

Unintentional, accidental - “bug”

What of intentional wrong behavior?

Utility/heuristic functions as an extension of an AI developer’s will

Where do you draw the line?

The Micromort

See R&N pp. 479-480

What if you’re coding the value of a micromort?

Machines Emulating People

Rodney Brooks wants to emulate people with robots

Businesses automating transactions

A consumer society without faces

Why create virtual reality?

Artificial Agent Societies

Artificial agents interact, form artificial societies

Computational resource sharing

Game theory assumes opportunism

Think Different! altruism, cooperation

Our programming (like our speech and actions) is a reflection of who we are and what we value. Value other people.


	Natural Language Communication
		speech formulation, speech recognition, lexical analysis, parsing, disambiguation, discourse understanding
	Bigger Issues
		responsibility in AI, utility functions and the value of human life, neo-Luddism, knowledge as power and intellectual capital, machines emulating people, artificial societies


	“intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs”
	Purposes: control only? No.:
		Inform, query, answer, request/command action, promise/bargain, acknowledge, share experiences, etc.
		Speech acts - direct: Help me!, indirect: I could use some help.


	Unlike machine language, natural language is
		ambiguous at many levels
		much more dynamic - anyone want to do version control?
		Fuzzy, approximate
		relies heavily on understanding of implicit communication, common sense knowledge
		etc.


Speaker:
	intention
		what to say when
		result of planning, decision analysis, and other thought/feeling processes
		hearer’s recognizing and understanding intention requires similar processes
	generation - choosing words
	synthesis - uttering words


	Hearer:
		perception - hearing words (could be mistaken)
		analysis - infer possible meanings
		disambiguation - pick most likely meaning
		incorporation - decide what to do with it


	mapping sound waves to a sequence of words
	“It’s hard to wreck a nice beach.”
	Probabilistic Context-Free Grammars (PCFGs)
	P(w1 w2 … wN) = P(w1) * P(w2\|w1) * … * P(wN\|w1 … w(N-1))
	CPTs too large:
		unigrams: Approximate as P(w1) * P(w2) * … * P(wN)
		bigrams: Approximate as P(w1) * P(w2\|w1) * … * P(wN\|w(N-1))
		trigrams:Approximate as P(w1) * P(w2\|w1) * P(w3\|w1,w2)… * P(wN\|w(N-1),w(N-2))


Tradeoff between:
	Context sensitivity:
		"I has", "man have" – subject-verb agreement
		"I, for one, has…" "man over there have"
	Memory, acquisition of sufficient training examples
Compromise: weighted sum of unigram, bigram, and trigram models


	Question #1: What speech sounds did the speaker utter? P(signal\|words)
		Human speech has 40-50 sounds called phones
		characterized by features in acoustic signal (e.g. frequency, amplitude, duration, etc.)
		application of machine learning


	Question #2: What words did the speaker intend to express with those sounds? P(words\|signal)
		“It’s not a porch. It’s a …”
		homophones (e.g. “0+2=2. One and one sum to two too.”)
		noise (focusing amidst multiple conversations)
		segmentation (Three string walk into a bar…)
		dialects (tow-may-tow, tow-mah-tow)
		coarticulation (tah-may-tow, tow-may-tow)


	Assume a language model P(words)
	Want P(words\|signal).
	If we had P(signal\|words), we could compute the words that maximize P(words\|signal). How?
	If the signal gave us a list of phones, we could do this, but we can't.
	The best we can do at this point is to compute P(words\|phones). Then we need P(phones\|signal).
	For this, a hidden Markov model (HMM) is used.


	Propositional Logic:
		Sentence ® Proposition \| Complex Sentence
		Proposition ® P \| Q \| R \| …
		Complex Sentence ® (Sentence) \| Ø Sentence \| Sentence Connective Sentence
		Connective ® Ù \| Ú \| Þ \| Û
	Ambiguity not resolved by parentheses resolved by precedence rules


	Approach: Hidden Markov Models (HMMs)
	"Hidden" – true state hidden from observer
	Any number of states can generate a given symbol
	The probability that a sequence came from the [m] model is the sum over all paths of
		the probability of the path, times
		the probability that the path generated the sequence.


	Three models
		language bigram à P(word(i)\|word(i-1))
		word pronunciation HMM à P(phones\|word)
		phone HMM à P(signal\|phone)
	To compute P(words\|signal), these need to be combined.
	One big HMM – make language bigram into an HMM and construct a large HMM by nesting each level of abstraction


	mapping a sequence of words to possible interpretations
	“Time flies like an arrow. Fruit flies like a banana.” - Groucho Marx
	list of tokens Þ annotated parse tree


	“Johanna baked cookies.”
	S(func(obj)) ® NP(obj) VP(func)
	VP(func(obj)) ® Verb(func) NP(obj)
	NP(obj) ® Name(obj) \| Noun(obj)
	Name(Johanna) ® Johanna
	Verb(ly lx Baked(x,y)) ® baked
	Noun(cookies) ® cookies


	Syntactic evidence: “Lee asked Kim to tell Toby to leave on Saturday.”
	Lexical evidence: “Lee placed the dress on the rack. Kim wanted the dress on the rack.”
	Semantic evidence: ball, diamond, bat, base
		I ate spaghetti with {meatballs, salad, abandon, a fork, a friend}.


	Metonymy - one object stands for another:
		I drive a Geo.
		The University frowns on squirrel chasing.
	Metaphor - “Prices are high. Stocks dropped.”
	Note: We’ve thrown out important information! Inflection differentiates:
		“Do you know what day this is?” “No.”
		“There’s another quiz today.” “No!”
		“I’m not ready for it.” “No?”


	John went to a fancy restaurant.
	He was pleased and gave the waiter a big tip.
	He spent $50.

	Did the waiter or John spend $50?
	Did the $50 include the tip?
	What was John pleased with?
	Why did he give the waiter a big tip?


	Understanding/context informs parsing.
	Parsing informs speech recognition.
	Spoken questions are used to disambiguate.
	As we learn how the brain processes speech, we’ll learn better architectures for natural language processing.


	Agent 1 heard Agent 2 say “The sky is falling!”
	Agent 1 heard Agent 2 say that Agent 3 said “The sky is falling!”
	Master-slave agent relationship:
		Ben: "You don't need to see his identification."
		Trooper: "We don't need to see his identification."
		Ben: "These are not the droids your looking for."
		Trooper: "These are not the droids we're looking for."
		Ben: "He can go about his business."
		Trooper: "You can go about your business."
		Ben: "Move along."
		Trooper: "Move along. Move along."


	Automation - people expensive, machines cheap
	When $$$ is all that matters, why not automate everything that saves a buck?
	Industrial Age : Luddites :: Information Age : Neo-Luddites
	Global competition, survival of the fittest, job specialization, automation
	What about job satisfaction?


	When software does the wrong thing
	Unintentional, accidental - “bug”
	What of intentional wrong behavior?
	Utility/heuristic functions as an extension of an AI developer’s will
	Where do you draw the line?


	See R&N pp. 479-480
	What if you’re coding the value of a micromort?


	Rodney Brooks wants to emulate people with robots
	Businesses automating transactions
	A consumer society without faces
	Why create virtual reality?


	Artificial agents interact, form artificial societies
	Computational resource sharing
	Game theory assumes opportunism
	Think Different! altruism, cooperation
	Our programming (like our speech and actions) is a reflection of who we are and what we value. Value other people.