[R-lang] Re: Removing collinearity from control variables

Alex Fine afine@bcs.rochester.edu
Sun Jan 9 16:52:26 PST 2011


In general, a very simple step you can take to reduce collinearity is to 
center your predictors, i.e. subtract the mean from each predictor (e.g. 
for a continuous predictor "data$cFrequency <- data$Frequency - 
mean(data$Frequency, na.rm=T)"; do the little na.rm thing in case you 
have NAs for that variable).

That being said, it's not really clear why you need to worry about 
collinearity within your control variables in this case.  If all you 
want to do is see whether some additional variable is significant after 
controlling for x, y, and z, and that additional variable is itself not 
collinear with x, y, and z, then collinearity within x, y, and z will 
not affect the model's ability to estimate either the direction or 
significance of the new variable(s).

Given all that, there's probably also no reason to bother with PCA in 
this case.

hope that helps,
alex

Ariel M. Goldberg wrote:
> Dear R-langers,
>
> I am working with data from the English Lexicon Project and am using the variables described by Baayen, Feldman & Schreuder (2006) to control for the basic factors that influence reading time (frequency, length, etc).  My goal is to determine if other variables are significant after having controlled for these factors.  I'd like to remove the collinearity from Baayen et al's variable set and I was wondering if you had any suggestions as to what might be the best way to do this.  I was thinking that PCA might be the best, particularly since I'm not concerned with the interpretation of variables at all.  Do you think that's a good way to go about it?
>
> Also, if PCA is good, I have a quick question.  Do I use all the principle components it creates, in order to account for 100% of the variance?  I think this makes sense since again, I'm not trying to interpret the various components.
>
> Thanks!
> Ariel
>
>
>
>
>   


More information about the ling-r-lang-L mailing list