[R-lang] Re: Collinearity and how to decide what predictors to include

Tue Jul 27 05:49:40 PDT 2010

hey,

you probably didn't do anything wrong (though i don't think you included 
the code you used to center the predictors; i trust you, though).  LS 
and AoA are just correlated.   (notice that centering did eliminate the 
other two previously very high correlations, though.)

so i guess there are a few things you can do.  as you said in your first 
e-mail, you can do nested model comparison.  because it's a little 
unclear what your original question about model comparison was, let me 
just make sure it's clear that all you're doing with model comparison is 
taking two models, where one is a subset of the other...

model1=lmer(word1~compoundType + AoA + (1|subject) + (1|word), family = 
"binomial", data=data)
model2=lmer(word1~compoundType + AoA + LS + (1|subject) + (1|word), 
family = "binomial", data=data)

...and then when you compare these models---anova(model1, model2)---a 
significant result means that LS is contributing to the model over and 
above AoA.  interactions are no different.  if you want to know if an 
interaction is justified, do this: 

model2=lmer(word1~compoundType + AoA + LS + (1|subject) + (1|word), 
family = "binomial", data=data)
model3=lmer(word1~compoundType + AoA + LS + AoA:LS + (1|subject) + 
(1|word), family = "binomial", data=data)

anova(model2, model3) tells you whether the interaction between those 
two variables contributes anything to explaining the data over and above 
just the main effects.  the only caveat, i guess, is that model 
comparison doesn't tell you anything about the direction of the effect.  
(so, you may be able to conclude that LS is a significant predictor of 
whether they use Word1, but you don't know whether it makes it more or 
less likely.)

another option would be PCA, where you reduce those two variables to one 
variable that captures both, but that doesn't seem ideal in this case, 
since you probably have theoretical reasons for thinking that AoA and LS 
are not the same thing (given the whole pesky critical period thing).

the last technique i know of is residualization, but i've never actually 
done this, so i'd rather let someone else take over if you want to know 
about that. 

in keeping with my tradition of referring you to other people, you might 
want to check out Florian Jaeger and Victor Kuperman's slides from the 
CUNY 2009 workshop on multilevel models, which contain a lot of 
comments/info concerning collinearity (including some comments on 
residualization).  slides from all presentations at the workshop are here: 

http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/

the slides on collinearity are in the slideshow entitled "Issues and 
Solutions....", and the coll. slides start around page 30.

alex

Xiao He wrote:
> Hi Alex,
>
> Thank you for the explanations. I have one more question regarding 
> centering. I attempted centering, but found that it did not remove two 
> predictors' correlations. Two examples are shown below. As you can 
> see, the correlation between aoa and ls is the same as that between 
> caoa and cls, namely, it is 0.527. I wonder if I did something wrong. 
> Thank you in advance!
>
> For example:
> before centering
> > lmer(n1~aoa + ls + (1|subject) + (1|word), cpn, family="binomial")
> Generalized linear mixed model fit by the Laplace approximation 
> Formula: n1 ~ aoa + ls + (1 | subject) + (1 | word) 
>    Data: cpn 
>    AIC BIC logLik deviance
>  345.1 364 -167.6    335.1
> Random effects:
>  Groups  Name        Variance Std.Dev.
>  word    (Intercept) 5.03699  2.24432 
>  subject (Intercept) 0.21577  0.46451 
> Number of obs: 321, groups: word, 24; subject, 15
>
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  0.08425    0.93113   0.090    0.928
> aoa         -0.02672    0.02599  -1.028    0.304
> ls           0.01771    0.03361   0.527    0.598
>
> Correlation of Fixed Effects:
>     (Intr) aoa   
> aoa -0.761       
> ls  -0.706  *0.527*
>
>
> After centering:
> > lmer(n1~caoa + cls + (1|subject) + (1|word), cpn, family="binomial")
> Generalized linear mixed model fit by the Laplace approximation 
> Formula: n1 ~ caoa + cls + (1 | subject) + (1 | word) 
>    Data: cpn 
>    AIC BIC logLik deviance
>  345.1 364 -167.6    335.1
> Random effects:
>  Groups  Name        Variance Std.Dev.
>  word    (Intercept) 5.03699  2.24432 
>  subject (Intercept) 0.21577  0.46451 
> Number of obs: 321, groups: word, 24; subject, 15
>
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept) -0.22650    0.50370  -0.450    0.653
> caoa        -0.02672    0.02599  -1.028    0.304
> cls          0.01771    0.03361   0.527    0.598
>
> Correlation of Fixed Effects:
>      (Intr) caoa 
> caoa 0.004       
> cls  0.001  *0.527 *
>