[Ligncse256] POSTagger Trigrams
Andrea Biaggi
abiaggi at cs.ucsd.edu
Wed Feb 18 09:34:14 PST 2009
I have also thought to have an "add one smoothing" to P(w|t). For every
word just adding one count for every tag. Don't know if this is useful
or not. And maybe this could "fix" your problem of -infinity paths.
Brent Payne wrote:
> We smoothed ours so that every possibility had some probability. We still encounter -Infinity paths when decoding, so we might not have smoothed enough.
>
> ----- Original Message -----
> From: "Andrea Biaggi" <abiaggi at cs.ucsd.edu>
> To: ligncse256 at ling.ucsd.edu
> Sent: Wednesday, February 18, 2009 12:16:11 AM GMT -08:00 US/Canada Pacific
> Subject: [Ligncse256] POSTagger Trigrams
>
> I have a question how to handle trigrams for P(t_i|t_{i-1}t_{i-2}).
> It is always the case that this probability is >0 for every combinations
> of tags in the validation/test set using counts of the trigram in the
> training set?
> Or we have to use a backoff/interpolation schema to handle all the
> possible cases? Or use some other techniques? Or just saying that it is
> = 0, therefore not a possible combination?
>
>
--
Andrea Biaggi abiaggi at cs.ucsd.edu
More information about the Ligncse256
mailing list