[Ligncse256] Kneser-Ney smoothing with trigram model

Mon Jan 26 21:50:40 PST 2009

Hey folks,

I've implemented a bigram and trigram Kneser-Ney language model.
The bigram model seems to be working well. The HUB WER is .072
and the perplexity for the tests are 387 and 376.

The trigram performs much worse. I have a few questions on how I should
deal with sparsity.

For reference here is the trigram KN model.

PKN^3 = max( count(w_i, w_i-1, w_i-2) - D , 0)/ \sum_{w_i} count(w_i, 
w_i-1, w_i-2)

+  D/\sum_{w_i} count(w_i, w_i-1, w_i-2) N_1+(w_i-1, w_i-2, \dot) PKN^2.

There are three terms that can be affected by sparsity:

 1. the count of the number of times the three words occurred in 
sequence, this is in the numerator in the first fraction.

 2. The marginal count of the preceeding two words over all possible 
words. This is in the denominators
of the two fractions.

 3. The unique number of words that follow w_i-2, w_i-1, which is in the 
numerator in the second fraction.

If the first term is 0, then the first fraction is 0. What should I do 
if the second and third term are zero?
I've tried backing off and using the bigram probability PKN^2 but that 
does not work very well.

Has anyone else gotten this to work?

Thanks,
Matt