[R-lang] Re: Self Paced Reading experiment using residuals with lmer

Levy, Roger rlevy@ucsd.edu
Fri Oct 15 17:45:25 PDT 2010

Hi Bruno,

The reason was that when you originally sent the message, you weren't a list member, so it got put in a holding bin which I'd been neglecting to look at.  I finally did that, and cleared your message to be posted.  (Most of the holding bin is spam.)  If someone's message doesn't go through, the best course of action is generally for that person to ensure that they're actually a list member, and if not, join, and repost.


On Oct 15, 2010, at 2:47 PM, Bruno Nicenboim wrote:

> Thanks for the detailed answer. 
> It's a pity that for  some reason, my question appeared 1 month and a half after I posted it! 
> I'll take a look to the links you sent me, I'm still interested in understanding the effect of the transformations.
> Bruno
> On Fri, Oct 15, 2010 at 7:44 PM, T. Florian Jaeger <tiflo@csli.stanford.edu> wrote:
> Hi Bruno,
> good question. Here's what I'd say:
> If I wanted to analyze all word-by-word RTs in my data, I would definitely do what Baayen et al did. However, in most SPR experiments, we are only analyzing the items, but not the fillers. Often folks are even only analyzing specific regions in the items. At that point then, I prefer to derive per-word residual RTs for the region of interest in the items from the entire data (to decorrelate properties of the region of interest that I am interested in from properties that can be reduced to commonalities with the remainder with the data). That cannot be achieved by the procedure employed in Harald's article.
> This also explains why I moved the position terms (which, btw, I model as a non-linear term in order to be maximally conservative) into the residualization (in addition to the standard terms for word length). This procedure is especially useful when -in your items- condition is confounded with position (of the word in the sentence or the sentence in the list, although the latter is usually easy to avoid). 
> I know for a fact (unpleasant experience) that this type of confound (which is rarely controlled in experiments with positional confounds) does matter and that this only shows in full clarity in the procedure I proposed. I had a nicely significant result, which went away once I applied this procedure (which is why the procedure is described in a blog article rather than in a journal ;)) and when I checked further I became very certain that the analysis was doing the right thing (for the type of situation I am describing here).
> So, long story short: if you're analyzing all data, then use Baayen's method. If you are analyzing only your items, I think there are reasons to use the method I propose. For examples of the analysis I proposed, see Hofmeister et al, submitted and Hofmeister in press (which also shows that the analysis doesn't always kill effects ;).
> As for transformations, several researchers have looked into this for various tasks. Obviously, there is Smith and Levy's work, suggesting that raw RTs should be used. There is work by Kliegl et al (2010) comparing raw, log, and reciprocal transforms of RTs (though not for SPR), arguing -if I recall correctly- for a reciprocal link. 
> Here's what I usually do (I think Baayen, Kuperman, etc. would do the same): I look at the data (qqplots, shapiro test for normality over residuals of model or -faster- over raw data by condition if you have a factorial design) and compare the usual suspects (raw, log, reciprocal). For some examples, you might find Victor Kuperman and my slides prepared for WOMM 2009 useful (http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/, there are some updated version of these slides at: http://hlplab.wordpress.com/2010/05/10/mini-womm-montreal-slides-now-available/). 
> When you do that, it becomes pretty obvious that the decision will in part depend on whether you prefer to exclude outliers or not. log transforms are useful when you do not remove outliers (as they make them less extreme values, since most outliers in SPR experiments are large rather than small values). I have to say that, in my experience, the transform did hardly ever change the results (and if that happens and I know it then I try to understand why).
> HTH,
> Florian
> On Sun, Aug 29, 2010 at 4:22 AM, Bruno Nicenboim <bruno.nicenboim@gmail.com> wrote:
> Hi,
> I'm analyzing the results of a SPR experiment.
> I saw that in Jaeger's blog (HLP/Jaeger lab blog) and in Jaeger, Fedorenko and
> Gibson's article "Anti-locality in English: Consequences for Theories of
> Sentence Comprehension" in order to analyze the results,  they use a linear
> model that takes as dependent variables the residuals of a model that looks
> roughly like this: (I didn't include the transformations they use)
> l <- lmer(RT ~  Wordlenght + positionofword + positionofstimulus +  (1 |
> SUBJ)...
> RTresidual <- residuals(l)
> (http://hlplab.wordpress.com/2008/01/23/modeling-self-paced-reading-data-
> effects-of-word-length-word-position-spill-over-etc/#more-46)
> Then, the final linear model looks like this:
> l <- lmer(RTresidual ~ CONDITION +
>            SPILLOVER_1 + SPILLOVER_2 + SPILLOVER_3 +
>            (1 | SUBJ) + (1 | ITEM)
> On the other hand, Baayen and Milim in "Analyzing Reaction Times" use a model
> that takes that takes as a dependent variable the RT (instead of residuals), and
> includes the word lenght and the position of the word and line in the same
> model, roughly like:
> l <- lmer(RT ~ CONDITION + Wordlenght + positionofword + positionofstimulus +
>            SPILLOVER_1 + SPILLOVER_2 + SPILLOVER_3 +
>            (1 | SUBJ) + (1 | ITEM)
> My questions are:
> Is there any advantage or disadvantage that should persuade me to use one
> approach or the other?
> Shouldn't I get similar results? (Because I don't)
> And finally, I've noticed that each researcher (not only in these two examples)
> uses different transformations on length, positions and reading times. Is there
> any way to check which transformation is the most appropriate?
> Thanks !
> -- 
> Bruno


Roger Levy                      Email: rlevy@ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy

More information about the ling-r-lang-L mailing list