[Ligncse256] Kneser-Ney smoothing with trigram model
Andrea Biaggi
abiaggi at cs.ucsd.edu
Mon Jan 26 23:34:19 PST 2009
Hey Matt,
I've done only the bi-gram. And I'm still having some problem by handling unknown words.
For the fact that \sum_{w_i} count(w_i, w_i-1, w_i-2) resolves to 0 if the sub sequence w_i-1, w_i-2 is unknown => Division by 0.
For this I'm using some other parameter adding manually to the training set some amount of unknown words (just a tag to let "space free" for them).
By doing that the instantiation of the class and computing all the training takes around 5-10 minutes.
I don't know any other ways to handle that.
Also in the Interpolation I had the same problem of the division by 0 if a word was unknown.
I think that the KN lang model, as given in the book and slides, works only if you have completely knowledge of the vocabulary.
Maybe the professor can answer better to your question.
Andrea
----- Original Message -----
From: "Matt Rodriguez" <mar008 at cs.ucsd.edu>
To: ligncse256 at ling.ucsd.edu
Sent: Monday, January 26, 2009 9:50:40 PM GMT -08:00 US/Canada Pacific
Subject: [Ligncse256] Kneser-Ney smoothing with trigram model
Hey folks,
I've implemented a bigram and trigram Kneser-Ney language model.
The bigram model seems to be working well. The HUB WER is .072
and the perplexity for the tests are 387 and 376.
The trigram performs much worse. I have a few questions on how I should
deal with sparsity.
For reference here is the trigram KN model.
PKN^3 = max( count(w_i, w_i-1, w_i-2) - D , 0)/ \sum_{w_i} count(w_i,
w_i-1, w_i-2)
+ D/\sum_{w_i} count(w_i, w_i-1, w_i-2) N_1+(w_i-1, w_i-2, \dot) PKN^2.
There are three terms that can be affected by sparsity:
1. the count of the number of times the three words occurred in
sequence, this is in the numerator in the first fraction.
2. The marginal count of the preceeding two words over all possible
words. This is in the denominators
of the two fractions.
3. The unique number of words that follow w_i-2, w_i-1, which is in the
numerator in the second fraction.
If the first term is 0, then the first fraction is 0. What should I do
if the second and third term are zero?
I've tried backing off and using the bigram probability PKN^2 but that
does not work very well.
Has anyone else gotten this to work?
Thanks,
Matt
_______________________________________________
Ligncse256 mailing list
Ligncse256 at ling.ucsd.edu
http://pidgin.ucsd.edu/mailman/listinfo/ligncse256
More information about the Ligncse256
mailing list