[R-lang] Re: Collinearity and how to decide what predictors to include
Alex Fine
afine@bcs.rochester.edu
Tue Jul 27 05:49:40 PDT 2010
hey,
you probably didn't do anything wrong (though i don't think you included
the code you used to center the predictors; i trust you, though). LS
and AoA are just correlated. (notice that centering did eliminate the
other two previously very high correlations, though.)
so i guess there are a few things you can do. as you said in your first
e-mail, you can do nested model comparison. because it's a little
unclear what your original question about model comparison was, let me
just make sure it's clear that all you're doing with model comparison is
taking two models, where one is a subset of the other...
model1=lmer(word1~compoundType + AoA + (1|subject) + (1|word), family =
"binomial", data=data)
model2=lmer(word1~compoundType + AoA + LS + (1|subject) + (1|word),
family = "binomial", data=data)
...and then when you compare these models---anova(model1, model2)---a
significant result means that LS is contributing to the model over and
above AoA. interactions are no different. if you want to know if an
interaction is justified, do this:
model2=lmer(word1~compoundType + AoA + LS + (1|subject) + (1|word),
family = "binomial", data=data)
model3=lmer(word1~compoundType + AoA + LS + AoA:LS + (1|subject) +
(1|word), family = "binomial", data=data)
anova(model2, model3) tells you whether the interaction between those
two variables contributes anything to explaining the data over and above
just the main effects. the only caveat, i guess, is that model
comparison doesn't tell you anything about the direction of the effect.
(so, you may be able to conclude that LS is a significant predictor of
whether they use Word1, but you don't know whether it makes it more or
less likely.)
another option would be PCA, where you reduce those two variables to one
variable that captures both, but that doesn't seem ideal in this case,
since you probably have theoretical reasons for thinking that AoA and LS
are not the same thing (given the whole pesky critical period thing).
the last technique i know of is residualization, but i've never actually
done this, so i'd rather let someone else take over if you want to know
about that.
in keeping with my tradition of referring you to other people, you might
want to check out Florian Jaeger and Victor Kuperman's slides from the
CUNY 2009 workshop on multilevel models, which contain a lot of
comments/info concerning collinearity (including some comments on
residualization). slides from all presentations at the workshop are here:
http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/
the slides on collinearity are in the slideshow entitled "Issues and
Solutions....", and the coll. slides start around page 30.
alex
Xiao He wrote:
> Hi Alex,
>
> Thank you for the explanations. I have one more question regarding
> centering. I attempted centering, but found that it did not remove two
> predictors' correlations. Two examples are shown below. As you can
> see, the correlation between aoa and ls is the same as that between
> caoa and cls, namely, it is 0.527. I wonder if I did something wrong.
> Thank you in advance!
>
> For example:
> before centering
> > lmer(n1~aoa + ls + (1|subject) + (1|word), cpn, family="binomial")
> Generalized linear mixed model fit by the Laplace approximation
> Formula: n1 ~ aoa + ls + (1 | subject) + (1 | word)
> Data: cpn
> AIC BIC logLik deviance
> 345.1 364 -167.6 335.1
> Random effects:
> Groups Name Variance Std.Dev.
> word (Intercept) 5.03699 2.24432
> subject (Intercept) 0.21577 0.46451
> Number of obs: 321, groups: word, 24; subject, 15
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 0.08425 0.93113 0.090 0.928
> aoa -0.02672 0.02599 -1.028 0.304
> ls 0.01771 0.03361 0.527 0.598
>
> Correlation of Fixed Effects:
> (Intr) aoa
> aoa -0.761
> ls -0.706 *0.527*
>
>
> After centering:
> > lmer(n1~caoa + cls + (1|subject) + (1|word), cpn, family="binomial")
> Generalized linear mixed model fit by the Laplace approximation
> Formula: n1 ~ caoa + cls + (1 | subject) + (1 | word)
> Data: cpn
> AIC BIC logLik deviance
> 345.1 364 -167.6 335.1
> Random effects:
> Groups Name Variance Std.Dev.
> word (Intercept) 5.03699 2.24432
> subject (Intercept) 0.21577 0.46451
> Number of obs: 321, groups: word, 24; subject, 15
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -0.22650 0.50370 -0.450 0.653
> caoa -0.02672 0.02599 -1.028 0.304
> cls 0.01771 0.03361 0.527 0.598
>
> Correlation of Fixed Effects:
> (Intr) caoa
> caoa 0.004
> cls 0.001 *0.527 *
>
More information about the ling-r-lang-L
mailing list