[R-lang] SPR experiment: using lmer, transforming data, collinearity, and using a covariable

T. Florian Jaeger tiflo at csli.stanford.edu
Fri Aug 1 10:31:14 PDT 2008


Hi Claire,

a couple of comments on top of what Roger and Klinton already said:


* I think it's more standard to calculate residual RTs by constructing
> subject-specific linear regressions rather than a mixed-effect linear
> regression pooling all the data.  Also, you usually want to use *all*
> the regions (not just the critical region/regions) in constructing this
> regression; maybe throw out the first and last regions of the sentence.
> I can't tell whether you're doing this.



   - Claire was following a suggestion I made on my blog - see
   http://hlplab.wordpress.com/2008/01/23/modeling-self-paced-reading-data-effects-of-word-length-word-position-spill-over-etc/and
linked posts, though I have changed things around a bit since then
   [update to come soon) (and I am using this in a paper that's almost
   finished). A mixed-effect regression with participants as random effect
   should be better than by-subject linear regression for the same reason why
   subject-differences generally are nicely accounted for by mixed-effect
   regressions. in a balanced design with about equally much data for each
   subject and, crucially, random slopes for all predictors (unlike what Claire
   currently has) the two approaches won't differ much anyway. for less
   balanced data, they will differ, and the mixed effect model should be better
   as it recognizes that the group-internal mean and SE are less reliable for
   small groups. Gelman & Hill 2007 have a nice discussion of this.


>
> * The problem with HPDinterval might be specific to the current state of
> lme4.  What error do you get?  Can you replicate it with a tiny toy
> dataset that you could post to the list?
>

   - HPDinterval is part of two packages, coda and lme4. i think languageR
   loads coda, too, and in any case i ran into similar problems. You have to
   specific which HPDinterval() you mean. i assume you want
   lme4::HPDinterval().

* This may incite controversy, but I personally would suggest being
> careful about residualizing and analyzing transformed RTs.  The reason
> for this is that the transform changes the interpretation of the linear
> regression (used to calculate residuals) and of any interactions in your
> analysis.
>
> * Is this a designed & balanced experiment?  If so, there shouldn't be
> problems with collinearity.



   - It's the covariate that introduces collinearity. but between the
   balanced variables there should be not collinearity *after
centering*(which you seem to do).
   - Klinton is right though: centering (and other steps to reduce
   collinearity) should be done on the the data set that only contains exactly
   those cases that will go into the analysis!
   - In answer to your question, Claire, I would be worried about a fixed
   effect correlation of .5. I may be overly cautious, but so far i've always
   checked whether my results hold if all fixed effect correlations are reduced
   to < 0.3, often even < 0.1 (via centering, residualization or principal
   component analysis). Most people are *less *conservative, but i found
   cases, where even correlations of .3 can screw things up (especially in
   larger models with many small correlations).

* You might consider having more random effects than intercepts in your
> mixed-effects regression.  I believe this is an open issue.


   - I think Baayen et al in press describe pretty exactly how to make that
   decision. it's a matter of model comparison, just as for fixed effects. I
   suggest following their suggestion.

* I'm not sure what criteria you want to use to exclude deviant
> participants.  Could you explain in greater detail?


   - I've seen exclusion based on more than 2 to 3 absolute subject-internal
   SEs away from the subject's mean. I find it worrysome that there is so much
   variance between papers. personally, i think 2 SEs is often too tight. also,
   after having looked at lots of transforms for several data sets, I use log
   RTs from the beginning (and I exclude based on deviations in the
   log-transformed space).



   - finally, the last point: regression with residuals: looks good to me. I
   assume you're removing the collinearity between the interaction and the main
   effects and that's indeed how it's done =)


Florian

>
>
> Hope this helps.
>
> Best
>
> Roger
>
> --
>
> Roger Levy                      Email: rlevy at ucsd.edu
> Assistant Professor             Phone: 858-534-7219
> Department of Linguistics       Fax:   858-534-4789
> UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy<http://ling.ucsd.edu/%7Erlevy>
>
> _______________________________________________
> R-lang mailing list
> R-lang at ling.ucsd.edu
> http://pidgin.ucsd.edu/mailman/listinfo/r-lang
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pidgin.ucsd.edu/pipermail/r-lang/attachments/20080801/1b043438/attachment.html 


More information about the R-lang mailing list