<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2900.3268" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT size=2>Hey all,</FONT></DIV>
<DIV> </DIV>
<DIV><FONT size=2>While trying to finish my writeup for HW2 this morning, I
decided to write some code to output a confusion matrix. Thought this
might be useful to anybody who's still playing in this framework...</FONT></DIV>
<DIV> </DIV>
<DIV><FONT size=2>You can see a demo at <A
href="http://www.cogsci.ucsd.edu/~bcipolli/WI08/LIGN%20256/hw2/conf_matrix.html">http://www.cogsci.ucsd.edu/~bcipolli/WI08/LIGN%20256/hw2/conf_matrix.html</A></FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>Roger, I'll be sending you this as part of the overall source
changes I made (with a bit of documentation) at the end of the quarter
sometime.</FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>=====================</FONT></DIV>
<DIV> </DIV>
<DIV><FONT size=2>I've attached the java file; drop it into
src/edu/berkeley/nlp/util. </FONT><FONT size=2>Here's a code snippet of
how I used it, in POSTaggerTester:</FONT></DIV>
<DIV><FONT size=2></FONT> </DIV><FONT size=2>
<DIV>In POSTaggerTester::evaluateTagger,</DIV>
<DIV><BR>>> ConfusionMatrix<String, String>
confusionMatrix = new ConfusionMatrix<String,String>();</DIV>
<DIV> </DIV>
<DIV> Then later @ line 563 (or anywhere inside loop over integer
position in sentence)</DIV>
<DIV> <BR> if
(guessedTag.equals(goldTag))<BR> numTagsCorrect +=
1.0;<BR>>> confusionMatrix.incrementCount(guessedTag,
goldTag);<BR> <BR> Note that the inserted statement doesn't go into
the "if" statement.<BR> <BR> Then, at the bottom of the
function:<BR>>> if (verbose)<BR>>>
System.out.println("Confusion Matrix: \n" +
confusionMatrix.toString(toChange));<BR> <BR> <BR>In order to
"group" tags according to "category" (which I defined), before my println
statement, I added:</DIV>
<DIV> </DIV>
<DIV> HashMap<String,List<String>>
toChange = new HashMap<String,
List<String>>();<BR> List<String>
toOther = Arrays.asList("-LRB-", "-RRB-", "EX", "FW", "LS", "MD", "PDT",
"WP", "WP$", "UH", "WRB");</DIV>
<DIV> </DIV>
<DIV> toChange.put("[ADJ]",
PennTreebankReader.adjTags);<BR>
toChange.put("[ADV]",
PennTreebankReader.advTags);<BR>
toChange.put("[NOUN]",
PennTreebankReader.nounTags);<BR>
toChange.put("[PUNCT]",
PennTreebankReader.punctTags);<BR>
toChange.put("[VERB]",
PennTreebankReader.verbTags);<BR>
toChange.put("[OTHER]", toOther);<BR> <BR>
Where I modified edu.berkeley.nlp.io.PennTreebankReader to have the static
members:<BR> public static List<String> adjTags =
Arrays.asList("JJ", "JJR", "JJS");<BR> public static
List<String> advTags = Arrays.asList("RB", "RBR",
"RBS");<BR> public static List<String> nounTags =
Arrays.asList("NN", "NNP", "NNPS", "NNS");<BR> public static
List<String> verbTags = Arrays.asList("VB", "VBD", "VBG", "VBN", "VBP",
"VBZ");<BR> public static List<String> punctTags =
Arrays.asList("#", "$", "\"", ",", ".", ":", "''", "``");</DIV>
<DIV> </DIV>
<DIV> <BR>This simply defines grouping / rewrite
rules within the confusion matrix. There's also a way to simply suppress
printing of rows/columns in the confusion matrix.</DIV>
<DIV> </DIV>
<DIV>OK, not sure if that's useful, easy, hard, etc. Please feel free to
email with any quesitons.</DIV>
<DIV> </DIV>
<DIV>Ben</FONT></DIV></BODY></HTML>