[Ligncse256] Confusion matrix creation / printout for HW2

Mon Mar 10 16:55:48 PDT 2008

Hey all,

While trying to finish my writeup for HW2 this morning, I decided to write some code to output a confusion matrix.  Thought this might be useful to anybody who's still playing in this framework...

You can see a demo at http://www.cogsci.ucsd.edu/~bcipolli/WI08/LIGN%20256/hw2/conf_matrix.html

Roger, I'll be sending you this as part of the overall source changes I made (with a bit of documentation) at the end of the quarter sometime.

=====================

I've attached the java file; drop it into src/edu/berkeley/nlp/util.  Here's a code snippet of how I used it, in POSTaggerTester:

In POSTaggerTester::evaluateTagger,

>>    ConfusionMatrix<String, String> confusionMatrix = new ConfusionMatrix<String,String>();

  Then later @ line 563 (or anywhere inside loop over integer position in sentence)

    if (guessedTag.equals(goldTag))
      numTagsCorrect += 1.0;
>>  confusionMatrix.incrementCount(guessedTag, goldTag);

  Note that the inserted statement doesn't go into the "if" statement.

  Then, at the bottom of the function:
>>  if (verbose)
>>    System.out.println("Confusion Matrix: \n" + confusionMatrix.toString(toChange));

In order to "group" tags according to "category" (which I defined), before my println statement, I added:

      HashMap<String,List<String>> toChange = new HashMap<String, List<String>>();
      List<String> toOther  = Arrays.asList("-LRB-", "-RRB-", "EX", "FW", "LS", "MD", "PDT", "WP", "WP$", "UH", "WRB");

      toChange.put("[ADJ]", PennTreebankReader.adjTags);
      toChange.put("[ADV]", PennTreebankReader.advTags);
      toChange.put("[NOUN]", PennTreebankReader.nounTags);
      toChange.put("[PUNCT]", PennTreebankReader.punctTags);
      toChange.put("[VERB]", PennTreebankReader.verbTags);
      toChange.put("[OTHER]", toOther);

  Where I modified edu.berkeley.nlp.io.PennTreebankReader to have the static members:
    public static List<String> adjTags = Arrays.asList("JJ", "JJR", "JJS");
    public static List<String> advTags = Arrays.asList("RB", "RBR", "RBS");
    public static List<String> nounTags = Arrays.asList("NN", "NNP", "NNPS", "NNS");
    public static List<String> verbTags = Arrays.asList("VB", "VBD", "VBG", "VBN", "VBP", "VBZ");
    public static List<String> punctTags = Arrays.asList("#", "$", "\"", ",", ".", ":", "''", "``");

This simply defines grouping / rewrite rules within the confusion matrix.  There's also a way to simply suppress printing of rows/columns in the confusion matrix.

OK, not sure if that's useful, easy, hard, etc.  Please feel free to email with any quesitons.

Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pidgin.ucsd.edu/pipermail/ligncse256/attachments/20080310/d8143fce/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ConfusionMatrix.java
Type: application/octet-stream
Size: 4305 bytes
Desc: not available
Url : http://pidgin.ucsd.edu/pipermail/ligncse256/attachments/20080310/d8143fce/attachment.obj