[R-lang] Re: Removing collinearity from control variables

Tue Jan 11 14:51:34 PST 2011

Hi Ariel,

Dear Florian,
>
> After receiving the other comments on the list, I looked over your and
> Kuperman's slides again and realized that I don't have to worry about the
> collinearity of the predictors.  I guess I was just following Baayen's
> papers (which don't explicitly mention this fact) and got overly worried.
>
> The one thing I do wonder about, however, is that in your slides, you do
> say:
> "Even if a predictor is not highly correlated with a single other predictor
> in the model, it can be highly collinear with the combination of predictors
> → collinearity will affect the predictor"
>
> That seems to suggest that even if the individual correlations aren't
> significant, there could still be an issue.
>

That is true. You can access this by e.g. the VIF (variance inflation
factor) statistics. It's isn't provided for mixed models, but you could
calculate it yourself. I'd do that within a model with the same random
effect structure as your model of interest.

> On a related note, I need to residualize my variable of interest
> (phonological similarity) against a different variable (letter similarity)
> since they are correlated r=-.418.  In your slides, you say that the
> residualized variable can't be back-transformed to the original variable.
>  Am I still allowed then, to conclude that Phonological Similarity is
> significant?
>

The back-transformation issue is mentioned as part of a discussion of how to
visualize the effects. It has nothing to do with what model you run or what
you can conclude from the model.

btw, with a correlation of abs(r)<.5, you'll often be fine without
residualization. in any case, if you care about showing that phon. sim.
matters beyond letter sim, i'd do model comparison and report that first.

HTH,
Florian

>
> Thanks so much for your help!
> Ari
>
> On Jan 9, 2011, at 9:45 PM, T. Florian Jaeger wrote:
>
> > Hi Ariel,
> >
> > if you don't care about the other variables, why are you concerned about
> collinearity between those variables? Collinearity only affects the standard
> error estimates of those predictors in the model that are collinear? (see )
> >
> > Does this help? if you're concerned about over-fitting, you can do PCA
> with the x most orthogonal influential components that capture 100% of the
> variance, but if over-fitting is not a concern I'd leave the original
> variable in the model.
> >
> > Florian
> >
> > On Sun, Jan 9, 2011 at 7:08 PM, Ariel M. Goldberg <
> ariel.goldberg@tufts.edu> wrote:
> > Dear R-langers,
> >
> > I am working with data from the English Lexicon Project and am using the
> variables described by Baayen, Feldman & Schreuder (2006) to control for the
> basic factors that influence reading time (frequency, length, etc).  My goal
> is to determine if other variables are significant after having controlled
> for these factors.  I'd like to remove the collinearity from Baayen et al's
> variable set and I was wondering if you had any suggestions as to what might
> be the best way to do this.  I was thinking that PCA might be the best,
> particularly since I'm not concerned with the interpretation of variables at
> all.  Do you think that's a good way to go about it?
> >
> > Also, if PCA is good, I have a quick question.  Do I use all the
> principle components it creates, in order to account for 100% of the
> variance?  I think this makes sense since again, I'm not trying to interpret
> the various components.
> >
> > Thanks!
> > Ariel
> >
> >
> >
> >
>
> --
> Ariel M. Goldberg
> Assistant Professor
> Tufts University
> Department of Psychology
> 490 Boston Avenue
> Medford, MA 02155
>
> (617) 627-3525
> http://ase.tufts.edu/psychology/psycholinglab/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucsd.edu/pipermail/ling-r-lang-l/attachments/20110111/7a6b0ad7/attachment-0001.html