[Ligncse256] Confusion matrix creation / printout for HW2
Ben Cipollini
bcipolli at cogsci.ucsd.edu
Mon Mar 10 16:55:48 PDT 2008
Hey all,
While trying to finish my writeup for HW2 this morning, I decided to write some code to output a confusion matrix. Thought this might be useful to anybody who's still playing in this framework...
You can see a demo at http://www.cogsci.ucsd.edu/~bcipolli/WI08/LIGN%20256/hw2/conf_matrix.html
Roger, I'll be sending you this as part of the overall source changes I made (with a bit of documentation) at the end of the quarter sometime.
=====================
I've attached the java file; drop it into src/edu/berkeley/nlp/util. Here's a code snippet of how I used it, in POSTaggerTester:
In POSTaggerTester::evaluateTagger,
>> ConfusionMatrix<String, String> confusionMatrix = new ConfusionMatrix<String,String>();
Then later @ line 563 (or anywhere inside loop over integer position in sentence)
if (guessedTag.equals(goldTag))
numTagsCorrect += 1.0;
>> confusionMatrix.incrementCount(guessedTag, goldTag);
Note that the inserted statement doesn't go into the "if" statement.
Then, at the bottom of the function:
>> if (verbose)
>> System.out.println("Confusion Matrix: \n" + confusionMatrix.toString(toChange));
In order to "group" tags according to "category" (which I defined), before my println statement, I added:
HashMap<String,List<String>> toChange = new HashMap<String, List<String>>();
List<String> toOther = Arrays.asList("-LRB-", "-RRB-", "EX", "FW", "LS", "MD", "PDT", "WP", "WP$", "UH", "WRB");
toChange.put("[ADJ]", PennTreebankReader.adjTags);
toChange.put("[ADV]", PennTreebankReader.advTags);
toChange.put("[NOUN]", PennTreebankReader.nounTags);
toChange.put("[PUNCT]", PennTreebankReader.punctTags);
toChange.put("[VERB]", PennTreebankReader.verbTags);
toChange.put("[OTHER]", toOther);
Where I modified edu.berkeley.nlp.io.PennTreebankReader to have the static members:
public static List<String> adjTags = Arrays.asList("JJ", "JJR", "JJS");
public static List<String> advTags = Arrays.asList("RB", "RBR", "RBS");
public static List<String> nounTags = Arrays.asList("NN", "NNP", "NNPS", "NNS");
public static List<String> verbTags = Arrays.asList("VB", "VBD", "VBG", "VBN", "VBP", "VBZ");
public static List<String> punctTags = Arrays.asList("#", "$", "\"", ",", ".", ":", "''", "``");
This simply defines grouping / rewrite rules within the confusion matrix. There's also a way to simply suppress printing of rows/columns in the confusion matrix.
OK, not sure if that's useful, easy, hard, etc. Please feel free to email with any quesitons.
Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pidgin.ucsd.edu/pipermail/ligncse256/attachments/20080310/d8143fce/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ConfusionMatrix.java
Type: application/octet-stream
Size: 4305 bytes
Desc: not available
Url : http://pidgin.ucsd.edu/pipermail/ligncse256/attachments/20080310/d8143fce/attachment.obj
More information about the Ligncse256
mailing list