[Ligncse256] POSTagger Trigrams

Andrea Biaggi abiaggi at cs.ucsd.edu
Wed Feb 18 09:34:14 PST 2009


I have also thought to have an "add one smoothing" to P(w|t). For every 
word just adding one count for every tag. Don't know if this is useful 
or not. And maybe this could "fix" your problem of -infinity paths.

Brent Payne wrote:
> We smoothed ours so that every possibility had some probability.  We still encounter -Infinity paths when decoding, so we might not have smoothed enough.
>
> ----- Original Message -----
> From: "Andrea Biaggi" <abiaggi at cs.ucsd.edu>
> To: ligncse256 at ling.ucsd.edu
> Sent: Wednesday, February 18, 2009 12:16:11 AM GMT -08:00 US/Canada Pacific
> Subject: [Ligncse256] POSTagger Trigrams
>
> I have a question how to handle trigrams for P(t_i|t_{i-1}t_{i-2}).
> It is always the case that this probability is >0 for every combinations 
> of tags in the validation/test set using counts of the trigram in the 
> training set?
> Or we have to use a backoff/interpolation schema to handle all the 
> possible cases? Or use some other techniques? Or just saying that it is 
> = 0, therefore not a possible combination?
>
>   


-- 
Andrea Biaggi abiaggi at cs.ucsd.edu



More information about the Ligncse256 mailing list