[R-lang] Re: Removing collinearity from control variables
Kyle Gorman
kylebgorman@gmail.com
Sun Jan 9 16:44:46 PST 2011
Actually, multicollinearity doesn't really matter if you aren't concerned with the interpretation of variables at all, but simply want them in the model as controls. Just include your multicollinear predictors as part of the set of main effects.
"Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors."
"Partial multicollinearity among control variables is almost entirely harmless. It does not undercut their effectiveness at eliminating omitted variable bias. It does not produce any sort of bias. It does not reduce the fit of a regression. Coefficient standard errors properly report the uncertainty attached to each estimate; there should be no opportunity to place more stock in a given coefficient than it deserves."
Kyle
On Jan 9, 2011, at 7:08 PM, Ariel M. Goldberg wrote:
> Dear R-langers,
>
> I am working with data from the English Lexicon Project and am using the variables described by Baayen, Feldman & Schreuder (2006) to control for the basic factors that influence reading time (frequency, length, etc). My goal is to determine if other variables are significant after having controlled for these factors. I'd like to remove the collinearity from Baayen et al's variable set and I was wondering if you had any suggestions as to what might be the best way to do this. I was thinking that PCA might be the best, particularly since I'm not concerned with the interpretation of variables at all. Do you think that's a good way to go about it?
>
> Also, if PCA is good, I have a quick question. Do I use all the principle components it creates, in order to account for 100% of the variance? I think this makes sense since again, I'm not trying to interpret the various components.
>
> Thanks!
> Ariel
>
>
>
More information about the ling-r-lang-L
mailing list