[Ligncse256] HW2: buildTrellis method
Ben Cipollini
bcipolli at cogsci.ucsd.edu
Fri Feb 22 02:33:09 PST 2008
OK, I finally see a bit better what's going on here. I can see why there's 48*48 source states to consider at each point.
I can see that the "position" property of the State simply encodes where we are in the trellis. This is necessary because the transitions between states depends on the current word, and that is basically kept track of in the State by the position. If you remove the position, then States with the same two previous tags (but different positions) get mixed. In a bad case, this can lead to a trellis with a loop, and that winds up putting GreedyDecoder into an infinite loop.
In my local copy, I have renamed "State" to "TrellisState", as a State is completely tied to building and decoding the trellis. I didn't really understand that; I thought State was a more general thing. I'm also confused about the Interner, as I still believe it is increasing memory about 25x normal and I can't imagine it's saving much time at all. I believe it's a hash table of states that gets way more new entries than it ever gets hits, due to the fact that States are unique on prevprevtag, prevtag, and position combinations. On the other hand, although I don't see why at all, the "canonicalization" it does looks to be necessary. Why? I don't know.
In fact, after reverting all the Interner code I changed, I was able to get the original code to work again :) Now back to the HMM... I swear it should work! :P
Ben
----- Original Message -----
From: Ben Cipollini
To: ligncse256 at ling.ucsd.edu
Sent: Thursday, February 21, 2008 8:38 PM
Subject: [Ligncse256] HW2: buildTrellis method
Hey all,
I'm trying to debug some issues with my HMM scorer. In doing so, the buildTrellis method of the POSTagger doesn't make sense to me. I was wondering if anybody's taken a look.
Here's some basics:
* I'm using a subset of the training data for debugging purposes.
* There are 48 unique tags in the training dataset that I'm using
* There are 1201 unique combinations of tags
What I expect to do in building a trellis is:
* T0: start at the "START" state, and calculate the probability of transitioning to each of the 48 unique tags
* T1: for each of the 48 tags, determine the probability of transitioning to each of the 48 unique tags.
* T2: ...
*T{end}: for each of the 48 tags, determine the probability of transitioning to the end state.
However, this is what I see buildTrellis doing:
* T0: same
* T1: same
* T2: for each of the 48*48 possibilities of previous transitions COMBINATIONS, calculate the transition to each of the 48 states.
I think this is happening because states are unique based not on the previous state, but on the previous state and previous-previous state.
I was expecting, however, that this shouldn't be--the probability of transition depends on the previous-previous and previous tag, but does the trellis need an extra edges or something? I figured the trellis would have the same exact architecture as the HMM trellis we discussed in class, and thus always compute transitions from 48 states to 48 other states (48^2 such computations).
In addition, the State class encodes sentence position. For the HMM model, this is never used. This winds up completely exploding caching of states to try and save memory, leading to maybe 20x extra memory used on State objects that is completely unnecessary in the HMM case, if I understand properly.
Has anybody sifted through this code or can provide feedback about my expectations? I'm not sure if I'm conceptually missing something. This whole State-Trellis object system seems unnecessarily complex and weighty for what I'm trying to do, and it's really making debugging a big chore for me.
Lastly, is there any principled reason that the State object uses names previousTag and previousPreviousTag instead of curTag and previousTag? Even the startState has prevprev as START and prev as START (no current). The semantics seem strange, and a function getNextState combines a passed-in curTag with previousTag instead of combining a nextTag with a curTag. I can't help but feeling I'm missing some semantics in what's intended.
Please anybody respond to any part of this mail. Just throwing some thoughts out there. And my apologies if I'm so far off that I'm not even making sense.
Thanks in advance,
Ben
------------------------------------------------------------------------------
_______________________________________________
Ligncse256 mailing list
Ligncse256 at ling.ucsd.edu
http://pidgin.ucsd.edu/mailman/listinfo/ligncse256
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pidgin.ucsd.edu/pipermail/ligncse256/attachments/20080222/d08de54c/attachment-0001.htm
More information about the Ligncse256
mailing list