[R-lang] Re: Self Paced Reading experiment using residuals with lmer

Fri Oct 15 22:19:57 PDT 2010

On 16.10.2010, at 01:44, T. Florian Jaeger wrote:
> 
> 
> As for transformations, several researchers have looked into this for various tasks. Obviously, there is Smith and Levy's work, suggesting that raw RTs should be used. There is work by Kliegl et al (2010) comparing raw, log, and reciprocal transforms of RTs (though not for SPR), arguing -if I recall correctly- for a reciprocal link. 
We arrived at the reciprocal transform only for data from a lexical decision task doing roughly what is described in the next paragraph (checking model residuals) plus determining the power coefficient from a Box-Cox profile. We did not recommend reciprocal transform for everything.  For fixation durations, in my experience a log transform does best. These recommendations hold if you want to use a LMM for statistical inference. If your theory dictates the use of raw scores you may need to switch to a GLMM (with a link function that takes care of the right skew in raw latencies.) 

> 
> Here's what I usually do (I think Baayen, Kuperman, etc. would do the same): I look at the data (qqplots, shapiro test for normality over residuals of model or -faster- over raw data by condition if you have a factorial design) and compare the usual suspects (raw, log, reciprocal). For some examples, you might find Victor Kuperman and my slides prepared for WOMM 2009 useful (http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/, there are some updated version of these slides at: http://hlplab.wordpress.com/2010/05/10/mini-womm-montreal-slides-now-available/). 
> 
> When you do that, it becomes pretty obvious that the decision will in part depend on whether you prefer to exclude outliers or not. log transforms are useful when you do not remove outliers (as they make them less extreme values, since most outliers in SPR experiments are large rather than small values). I have to say that, in my experience, the transform did hardly ever change the results (and if that happens and I know it then I try to understand why).
This is true for estimates for fixed effects (as we also showed), unless you are a strong believer in p-values. In the latter case, in my opinion, the appropriate transformation ususally increases chances and may lift a marginal effect or interaction over the threshold. However, transformations do  have very strong effects on estimates of variance/covariance components, especially those involving the intercept. If you go this route, you need a strong theoretical justification for the zero point in your model.

Reinhold Kliegl

On 16.10.2010, at 01:44, T. Florian Jaeger wrote:

> Hi Bruno,
> 
> good question. Here's what I'd say:
> 
> If I wanted to analyze all word-by-word RTs in my data, I would definitely do what Baayen et al did. However, in most SPR experiments, we are only analyzing the items, but not the fillers. Often folks are even only analyzing specific regions in the items. At that point then, I prefer to derive per-word residual RTs for the region of interest in the items from the entire data (to decorrelate properties of the region of interest that I am interested in from properties that can be reduced to commonalities with the remainder with the data). That cannot be achieved by the procedure employed in Harald's article.
> 
> This also explains why I moved the position terms (which, btw, I model as a non-linear term in order to be maximally conservative) into the residualization (in addition to the standard terms for word length). This procedure is especially useful when -in your items- condition is confounded with position (of the word in the sentence or the sentence in the list, although the latter is usually easy to avoid). 
> 
> I know for a fact (unpleasant experience) that this type of confound (which is rarely controlled in experiments with positional confounds) does matter and that this only shows in full clarity in the procedure I proposed. I had a nicely significant result, which went away once I applied this procedure (which is why the procedure is described in a blog article rather than in a journal ;)) and when I checked further I became very certain that the analysis was doing the right thing (for the type of situation I am describing here).
> 
> So, long story short: if you're analyzing all data, then use Baayen's method. If you are analyzing only your items, I think there are reasons to use the method I propose. For examples of the analysis I proposed, see Hofmeister et al, submitted and Hofmeister in press (which also shows that the analysis doesn't always kill effects ;).
> 
> As for transformations, several researchers have looked into this for various tasks. Obviously, there is Smith and Levy's work, suggesting that raw RTs should be used. There is work by Kliegl et al (2010) comparing raw, log, and reciprocal transforms of RTs (though not for SPR), arguing -if I recall correctly- for a reciprocal link. 
> 
> Here's what I usually do (I think Baayen, Kuperman, etc. would do the same): I look at the data (qqplots, shapiro test for normality over residuals of model or -faster- over raw data by condition if you have a factorial design) and compare the usual suspects (raw, log, reciprocal). For some examples, you might find Victor Kuperman and my slides prepared for WOMM 2009 useful (http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/, there are some updated version of these slides at: http://hlplab.wordpress.com/2010/05/10/mini-womm-montreal-slides-now-available/). 
> 
> When you do that, it becomes pretty obvious that the decision will in part depend on whether you prefer to exclude outliers or not. log transforms are useful when you do not remove outliers (as they make them less extreme values, since most outliers in SPR experiments are large rather than small values). I have to say that, in my experience, the transform did hardly ever change the results (and if that happens and I know it then I try to understand why).
> 
> HTH,
> Florian
> 
> On Sun, Aug 29, 2010 at 4:22 AM, Bruno Nicenboim <bruno.nicenboim@gmail.com> wrote:
> Hi,
> I'm analyzing the results of a SPR experiment.
> 
> I saw that in Jaeger's blog (HLP/Jaeger lab blog) and in Jaeger, Fedorenko and
> Gibson's article "Anti-locality in English: Consequences for Theories of
> Sentence Comprehension" in order to analyze the results,  they use a linear
> model that takes as dependent variables the residuals of a model that looks
> roughly like this: (I didn't include the transformations they use)
> 
> l <- lmer(RT ~  Wordlenght + positionofword + positionofstimulus +  (1 |
> SUBJ)...
> 
> RTresidual <- residuals(l)
> 
> (http://hlplab.wordpress.com/2008/01/23/modeling-self-paced-reading-data-
> effects-of-word-length-word-position-spill-over-etc/#more-46)
> 
> Then, the final linear model looks like this:
> 
> l <- lmer(RTresidual ~ CONDITION +
>            SPILLOVER_1 + SPILLOVER_2 + SPILLOVER_3 +
>            (1 | SUBJ) + (1 | ITEM)
> 
> On the other hand, Baayen and Milim in "Analyzing Reaction Times" use a model
> that takes that takes as a dependent variable the RT (instead of residuals), and
> includes the word lenght and the position of the word and line in the same
> model, roughly like:
> 
> l <- lmer(RT ~ CONDITION + Wordlenght + positionofword + positionofstimulus +
>            SPILLOVER_1 + SPILLOVER_2 + SPILLOVER_3 +
>            (1 | SUBJ) + (1 | ITEM)
> 
> 
> My questions are:
> Is there any advantage or disadvantage that should persuade me to use one
> approach or the other?
> Shouldn't I get similar results? (Because I don't)
> And finally, I've noticed that each researcher (not only in these two examples)
> uses different transformations on length, positions and reading times. Is there
> any way to check which transformation is the most appropriate?
> 
> Thanks !
> 
> 
> 

----
Reinhold Kliegl, Dept. of Psychology, University of Potsdam,
Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
phone: +49 331 977 2868, fax: +49 331 977 2793
http://www.psych.uni-potsdam.de/people/kliegl/index-e.html


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20101016/a8833669/attachment-0001.html