[R-lang] Re: Removing collinearity from control variables

Sun Jan 9 19:08:49 PST 2011

Thank you all so much for your help.  Now that I don't have to worry about the collinearity of the nuisance variables, everything is much easier.  It turns out I still need to residualize my variable of interest against one of the nuisance variables, but that's all that I will need to do.  And thanks for the scale() advice!

Best,
AG

On Jan 9, 2011, at 10:06 PM, Finlayson, Ian wrote:

> A quick tip to save a little typing, you can center a predictor using scale(data$Frequency,scale=F). If you're keen on keeping the number of columns in your dataset to a minimum, then scale() can also be directly applied to the predictors in the model.
>  
> For what it's worth, I'd second the point that there's no real need to worry about collinearity if it's *just* between your controls. If it's between controls and predictors-of-interest then that's more of an issue and there's a few different approaches for trying to take care of it (I've been residualizing against PoIs, as the direction of causality is pretty clear).
>  
> Ian
>  
>  
> 
> From: ling-r-lang-l-bounces@mailman.ucsd.edu on behalf of Alex Fine
> Sent: Mon 10/01/2011 00:52
> To: Ariel M. Goldberg
> Cc: ling-r-lang-l@mailman.ucsd.edu
> Subject: [R-lang] Re: Removing collinearity from control variables
> 
> In general, a very simple step you can take to reduce collinearity is to
> center your predictors, i.e. subtract the mean from each predictor (e.g.
> for a continuous predictor "data$cFrequency <- data$Frequency -
> mean(data$Frequency, na.rm=T)"; do the little na.rm thing in case you
> have NAs for that variable).
> 
> That being said, it's not really clear why you need to worry about
> collinearity within your control variables in this case.  If all you
> want to do is see whether some additional variable is significant after
> controlling for x, y, and z, and that additional variable is itself not
> collinear with x, y, and z, then collinearity within x, y, and z will
> not affect the model's ability to estimate either the direction or
> significance of the new variable(s).
> 
> Given all that, there's probably also no reason to bother with PCA in
> this case.
> 
> hope that helps,
> alex
> 
> Ariel M. Goldberg wrote:
> > Dear R-langers,
> >
> > I am working with data from the English Lexicon Project and am using the variables described by Baayen, Feldman & Schreuder (2006) to control for the basic factors that influence reading time (frequency, length, etc).  My goal is to determine if other variables are significant after having controlled for these factors.  I'd like to remove the collinearity from Baayen et al's variable set and I was wondering if you had any suggestions as to what might be the best way to do this.  I was thinking that PCA might be the best, particularly since I'm not concerned with the interpretation of variables at all.  Do you think that's a good way to go about it?
> >
> > Also, if PCA is good, I have a quick question.  Do I use all the principle components it creates, in order to account for 100% of the variance?  I think this makes sense since again, I'm not trying to interpret the various components.
> >
> > Thanks!
> > Ariel
> >
> >
> >
> >
> >  
> 

--
Ariel M. Goldberg
Assistant Professor
Tufts University
Department of Psychology
490 Boston Avenue
Medford, MA 02155

(617) 627-3525
http://ase.tufts.edu/psychology/psycholinglab/