[R-lang] SPR experiment: using lmer, transforming data, collinearity, and using a covariable
Klinton Bicknell
kbicknell at ling.ucsd.edu
Fri Aug 1 09:49:14 PDT 2008
Hi Claire,
I have some experience using lmer models to analyze self-paced reading
data, and basically your script looks good! A few comments:
1) I don't really have experience in fitting the distribution of RT
values and transforming them, but your approach seems reasonable to me.
2) You probably want to center your three variables *after* doing the
subset to remove fillers, instead of before. Since in your script
right now you're centering them before doing this subset, there is a
good chance that they are no longer centered when you are doing your
actual analysis.
3) You may want to try including random participant and item slopes
for your effects of interest, since that is something essentially
built in to analyses like a traditional repeated measures Anova.
However, putting in the full (1+cRelativiser * cAttachment | item),
may make your model hard for lmer to estimate. So, you may be able to
iteratively add or iteratively remove these random effects using the
anova() function to compare lmer models that are a subset of each
other. (Harald Baayen's In Press book discusses this in chapter 7.)
4) I'm guessing that when the script says 'the highest value is 0.57',
you're meaning that 0.57 is the largest of the correlations of fixed
effects in the lmer summary? (and thus you're checking for
collinearity). I believe that as a rule of thumb (and someone please
correct me if I'm wrong), correlations of fixed effects over 0.1 are
somewhat troubling, and over 0.2 are bad. (Although I believe higher
correlations between a coefficient estimate and the intercept estimate
are inconsequential.) You may find that centering your variables after
doing the subset will fix the collinearity that you may be having.
5) I'm not quite sure I understand the 'regression with residuals'
section. Are you trying to deal with a correlation between the
estimates of the coefficients for cRelativiser/cAttachment and their
interaction? If so, I'm not sure of a good way to do that--maybe
someone else knows of one? But centering your variables will probably
go a long way to helping. (You'll want to center again after making
the SPR.relativiser subset).
6) Finally, regarding the error from HPDinterval, could you give the
error message that you're getting!
Hope this helps!
Klinton
On Jul 28, 2008, at 8:07 , Claire Delle Luche wrote:
> Dear R-lang users,
>
> I try to analyse a self paced reading experiment where I have two
> fixed variables (Relativiser, Attachment), two random variables
> (Participant, ItemNbr), and one covariable (BiasValue). The
> dependant variable is RT, reading time region by region.
>
> My main aim is to remove the variance induced by BiasValue on
> Attachment, but I did not find any code for that.
>
> My procedure is the following, elaborated from bits and tips:
> - fit the data distribution, suggesting an inverse square root
> transform, rather than the classical log transform (on all RTs)
> - exclude deviant participants
> - calculate residual RTs (quite common in SPR experiments)
> - check for collinearity
> - then run the analysis region by region, with BiasValue as a
> covariate
> - obtain HPD intervals (it fails)
>
> I am unsure about my script and would appreciate your help.
>
> Thank you very much in advance.
>
> Claire Delle Luche
> Laboratoire Dynamique du Langage
> Lyon, FRANCE
>
> Here is my script:
>
>
> library(lme4)
> library(languageR)
>
> SPR = read.table("DataWord.txt", header=TRUE) # read data
>
> # fixed factors are Relativiser (qui, lequel), Attachment (N1, N2)
> # BiasValue is the result from a pretest evaluating a bias for one
> construction over an other, used as covariable with Attachment
> # random factors are Participant and ItemNbr
> # the dependant variable is RT
> # the "Word" column distinguish between FILLERS and experimental
> words, coded for their function in the sentence
>
> ###### data transformation: classical log or a transformation
> fitting the distribution better
> # Box-Cox transform
> SPR.lm <- lm(RT ~ Relativiser * Attachment + Participant + ItemNbr +
> BiasValue, data = SPR)
> SPR.bc <- MASS:::boxcox(SPR.lm)
> SPR.bc$x[which.max(SPR.bc$y)]
> # the result suggest an inverse square root transform would be best
>
> # comparison between a log transform and an inverse square root
> transform
> SPR.lm <- lm(log(RT) ~ Relativiser * Attachment + Participant +
> ItemNbr + BiasValue, data = SPR)
> SPR.lm2 <- lm(I(RT^(-1/2)) ~ Relativiser * Attachment + Participant
> + ItemNbr + BiasValue, data = SPR)
> par(mfrow = c(2, 2))
> plot(SPR.lm, which = 1:2)
> plot(SPR.lm2, which = 1:2)
> # the 2nd transformation is better
>
> SPR$transRT <- I((SPR$RT)^(-1/2))
>
> #####
>
> # identification of deviant participants
> abs(scale(unlist(lapply(split(SPR$transRT, as.factor(as.character(SPR
> $Participant))), mean)))) < 3
> abs(scale(unlist(lapply(split(SPR$transRT, as.factor(as.character(SPR
> $Participant))), mean)))) < 2.5
> abs(scale(unlist(lapply(split(SPR$transRT, as.factor(as.character(SPR
> $Participant))), mean)))) < 2
> # exclude deviant participants
> SPR.RTcor <- subset(SPR, Participant != ("IM") , Participant !=
> ("PL"))
> SPR.RTcorr <- subset(SPR, Participant != ("AF"))
>
> #### comparison of data distributions, without deviants
> SPR.lm3 <- lm(I(RT^(-1/2)) ~ Relativiser * Attachment + Participant
> + ItemNbr + BiasValue, data = SPR.RTcorr)
> par(mfrow = c(2, 2))
> plot(SPR.lm2, which = 1:2)
> plot(SPR.lm3, which = 1:2)
> # maybe delete the deviant data points as seen in the graph
>
> # maybe add a spillover procedure
>
> # regression against Nbr of characters, position in the sentence and
> in the list
> SPR.anal = lmer(transRT ~ CharNbr + WdNbr + SentNbr + (1|
> Participant), SPR.RTcorr)
>
> # residual RTs
> SPR.RTcorr$RTResidual <- residuals(SPR.anal)
>
> # centering
> SPR.RTcorr$cRelativiser <- as.numeric(scale(ifelse(SPR.RTcorr
> $Relativiser == "Qui",1,0), scale=F))
> SPR.RTcorr$cAttachment <- as.numeric(scale(ifelse(SPR.RTcorr
> $Attachment == "N1",1,0), scale=F))
> SPR.RTcorr$cBiasValue <- as.numeric(scale(SPR.RTcorr$BiasValue,
> scale=F))
>
> # subset for experimental data only
> SPR.exp <- subset(SPR.RTcorr, Word != "FILLER")
>
> # analysis for experimental trials only and log transformed residual
> RTs
> SPR.anal = lmer(RTResidual ~ cRelativiser * cAttachment + (1|
> Participant) + (1|ItemNbr) + cBiasValue, data=SPR.exp)
> summary(SPR.anal)
> # the highest value is 0.57
>
> # regression with residuals
> SPR.exp$iRelAtt <- residuals(lm(I(cRelativiser * cAttachment) ~
> cRelativiser + cAttachment, SPR.exp))
>
> SPR.anal1 = lmer(RTResidual ~ cRelativiser + cAttachment + iRelAtt +
> (1|Participant) + (1|ItemNbr) + cBiasValue, data=SPR.exp)
> summary(SPR.anal)
> # the highest value is 0.57
>
> ##### is there collinearity problem?
>
>
> #### analysis of the data word by word, here with RELATIVISER ONLY
> #### Relativiser and Attachment are two fixed factors and BiasValue
> should be a covariate for Attachment, to remove the variance due to
> BiasValue
> SPR.RELATIVISER <- subset (SPR.exp, Word == "Rel")
> SPR.RELATIVISER = lmer(RTResidual ~Relativiser*Attachment + (1|
> Participant) + (1|ItemNbr) + BiasValue, data=SPR.RELATIVISER)
> summary(SPR.RELATIVISER) # Print results of LME analysis
> plot(fitted(SPR.RELATIVISER), residuals(SPR.RELATIVISER))
> qqnorm(residuals(SPR.RELATIVISER))
>
> SPR.RELATIVISER.mc <- mcmcsamp(SPR.RELATIVISER, 10000)
> densityplot(SPR.RELATIVISER.mc)
> qqmath(SPR.RELATIVISER.mc)
> xyplot(SPR.RELATIVISER.mc)
> HPDinterval(SPR.RELATIVISER.mc)
> # I get an error for HPDinterval
>
>
>
> _______________________________________________
> R-lang mailing list
> R-lang at ling.ucsd.edu
> http://pidgin.ucsd.edu/mailman/listinfo/r-lang
More information about the R-lang
mailing list