[Ligncse256] Kneser-Ney smoothing with trigram model
Matt Rodriguez
mar008 at cs.ucsd.edu
Mon Jan 26 21:50:40 PST 2009
Hey folks,
I've implemented a bigram and trigram Kneser-Ney language model.
The bigram model seems to be working well. The HUB WER is .072
and the perplexity for the tests are 387 and 376.
The trigram performs much worse. I have a few questions on how I should
deal with sparsity.
For reference here is the trigram KN model.
PKN^3 = max( count(w_i, w_i-1, w_i-2) - D , 0)/ \sum_{w_i} count(w_i,
w_i-1, w_i-2)
+ D/\sum_{w_i} count(w_i, w_i-1, w_i-2) N_1+(w_i-1, w_i-2, \dot) PKN^2.
There are three terms that can be affected by sparsity:
1. the count of the number of times the three words occurred in
sequence, this is in the numerator in the first fraction.
2. The marginal count of the preceeding two words over all possible
words. This is in the denominators
of the two fractions.
3. The unique number of words that follow w_i-2, w_i-1, which is in the
numerator in the second fraction.
If the first term is 0, then the first fraction is 0. What should I do
if the second and third term are zero?
I've tried backing off and using the bigram probability PKN^2 but that
does not work very well.
Has anyone else gotten this to work?
Thanks,
Matt
More information about the Ligncse256
mailing list