Speech Recognition: language
model
• mapping sound waves to a sequence of words
• “It’s hard to wreck a nice beach.”
• Probabilistic Context-Free Grammars (PCFGs)
• P(w1 w2 … wN) = P(w1) * P(w2|w1) * … * P(wN|w1 …
w(N-1))
• CPTs too large:
– unigrams: Approximate as P(w1) * P(w2) * … * P(wN)
– bigrams: Approximate as P(w1) * P(w2|w1) * … * P(wN|w(N-1))
– trigrams:Approximate as P(w1) * P(w2|w1) * P(w3|w1,w2)… *
P(wN|w(N-1),w(N-2))