Bayesian Approach
Assume a language model P(words)
Want P(words|signal).
If we had P(signal|words), we could compute the
words that maximize P(words|signal).  How?
If the signal gave us a list of phones, we could do
this, but we can't.
The best we can do at this point is to compute
P(words|phones).  Then we need P(phones|signal).
For this, a hidden Markov model (HMM) is
used.