[Ligncse256] Kneser-Ney smoothing with trigram model

Andrea Biaggi abiaggi at cs.ucsd.edu
Mon Jan 26 23:34:19 PST 2009


Hey Matt,
I've done only the bi-gram. And I'm still having some problem by handling unknown words.
For the fact that \sum_{w_i} count(w_i, w_i-1, w_i-2) resolves to 0 if the sub sequence w_i-1, w_i-2 is unknown => Division by 0.
For this I'm using some other parameter adding manually to the training set some amount of unknown words (just a tag to let "space free" for them).
By doing that the instantiation of the class and computing all the training takes around 5-10 minutes.
I don't know any other ways to handle that.
Also in the Interpolation I had the same problem of the division by 0 if a word was unknown.
I think that the KN lang model, as given in the book and slides, works only if you have completely knowledge of the vocabulary.
Maybe the professor can answer better to your question.
Andrea

----- Original Message -----
From: "Matt Rodriguez" <mar008 at cs.ucsd.edu>
To: ligncse256 at ling.ucsd.edu
Sent: Monday, January 26, 2009 9:50:40 PM GMT -08:00 US/Canada Pacific
Subject: [Ligncse256] Kneser-Ney smoothing with trigram model

Hey folks,

I've implemented a bigram and trigram Kneser-Ney language model.
The bigram model seems to be working well. The HUB WER is .072
and the perplexity for the tests are 387 and 376.

The trigram performs much worse. I have a few questions on how I should
deal with sparsity.
 
For reference here is the trigram KN model.

PKN^3 = max( count(w_i, w_i-1, w_i-2) - D , 0)/ \sum_{w_i} count(w_i, 
w_i-1, w_i-2)

+  D/\sum_{w_i} count(w_i, w_i-1, w_i-2) N_1+(w_i-1, w_i-2, \dot) PKN^2.


There are three terms that can be affected by sparsity:

 1. the count of the number of times the three words occurred in 
sequence, this is in the numerator in the first fraction.

 2. The marginal count of the preceeding two words over all possible 
words. This is in the denominators
of the two fractions.

 3. The unique number of words that follow w_i-2, w_i-1, which is in the 
numerator in the second fraction.

If the first term is 0, then the first fraction is 0. What should I do 
if the second and third term are zero?
I've tried backing off and using the bigram probability PKN^2 but that 
does not work very well.

Has anyone else gotten this to work?

Thanks,
Matt

_______________________________________________
Ligncse256 mailing list
Ligncse256 at ling.ucsd.edu
http://pidgin.ucsd.edu/mailman/listinfo/ligncse256



More information about the Ligncse256 mailing list