 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
Three
models
|
|
|
|
– |
language
bigram à P(word(i)|word(i-1))
|
|
|
|
– |
word
pronunciation HMM à P(phones|word)
|
|
|
|
– |
phone
HMM à P(signal|phone)
|
|
|
• |
To
compute P(words|signal), these need to be
|
|
|
combined.
|
|
|
• |
One
big HMM – make language bigram into an
|
|
|
HMM
and construct a large HMM by nesting each
|
|
level
of abstraction
|
|