[R-lang] Re: analysis of acceptability judgements

Fri Oct 15 19:17:21 PDT 2010

Having written on this topic myself, I guess I'm obliged to add something
here.

First, about scales, I would agree that Likert scales are not useful nor
particularly appropriate for linguistic acceptability. In addition to the
references already given, in particular the papers by Jon Sprouse implying
that acceptability tends to be binary regardless of the scale (my own
experience as well), and the ones cited by Antti Arppe showing the
statistical power of binary judgment scales for acceptability, there is
also this recent paper making a similar point:

Bader, M., & Haussler, J. (2010). Toward a model of grammaticality
judgments. J. Linguistics, 46, 273-330. [with an umlaut on Haussler]

Second, I have a paper giving a tutorial on designing and analyzing
binary-scale syntactic judgment experiments (aimed at total novices):

Myers, J. (2009). The design and analysis of small-scale syntactic
judgment experiments. Lingua, 119, 425-444.
doi:10.1016/j.lingua.2008.09.003

You can see other papers I've written on linguistic methodology by going
to my homepage, clicking "Research", then selecting
"Approach:Methodology":

http://www.ccunix.ccu.edu.tw/~lngmyers/

Third, I have a free, open-source program, MiniJudge, for designing,
running, and analyzing binary judgment experiments, which is currently
being updated (by a professional programmer finally) in a snazzy
Java-based GUI format. It does the statistics using R - the older versions
would print the code to a window, which the user would then have to load
into R manually, but the latest version does all this behind the scenes.
I'll let you know when the snazzy version is ready (hopefully very soon),
but meanwhile the older JavaScript and Java versions are available here:

http://www.ccunix.ccu.edu.tw/~lngproc/MiniJudge.htm

Finally, in the above two places (Lingua paper and MiniJudge) I make an
as-yet totally ignored statistical proposal about how to deal with a
notorious bias in acceptability judgments: the reduction in sensitivity
over the course of making many similar judgments. Sprouse claims this is
entirely an artificact of using a binary scale, but data from our lab
suggests that it can happen with any scale. My statistical solution is to
include order of items as a factor in the (mixed-effect) regression
analysis, then factoring out both this and its interaction with the
syntactic contrast of interest. For example, if you hypothesize that
factor X should affect judgments, where -X sentences should be better than
+X sentences, then if the X effect weakens over the course of an
experiment, you should see a significant interaction between X and item
order. By explicitly including this bias in the analysis, the true effect
of X may stand out more strongly (and whether or not X weakens at all may
also tell you something about its nature).

James Myers
National Chung Cheng University
http://www.ccunix.ccu.edu.tw/~lngmyers/
Lngmyers@ccu.edu.tw