[R-lang] High collinearity

T. Florian Jaeger tiflo at csli.stanford.edu
Sat Apr 11 14:47:18 PDT 2009


On Sat, Apr 11, 2009 at 8:55 AM, Marco <dutch.linguistics at gmail.com> wrote:

> Dear R-langs,
>
> I have a data set that contains highly correlated variables (> .90), all of
> which are variables that occur on the same time scale. I crucially want to
> determine whether one of these variables (End) has explanatory power on top
> of all the other ones. In this case, is it legitimate to take the residuals
> of End (fitting an lm model, in which we explain End with all other,
> correlated variables), and then running an lmer model that only contains
> resid_end? When I look at the results I obtain, it seems like the other
> correlated variables result in corrupted residuals for End. Are there any
> other methods to deal with (and distinguish between) highly correlated
> variables in R?  Or could you tell me whether it is valid to use these
> residuals (and the F values obtained for these residuals), even though the
> beta coefficients are uninterpretable?


I think in your situation, you should first do some model comparison,
preferably bootstrapping over it (i.e. test e.g. 10,000 times which of the
two predictors would be removed from the model if you sampled from your data
randomly with replacement). that's the best thing to do if you have such
high correlations.

residuals can be used, but you would have to residualize both ways and test
the two resulting models.

florian

>
>
> Thanks in advance!
>
> Marco
>
> _______________________________________________
> R-lang mailing list
> R-lang at ling.ucsd.edu
> http://pidgin.ucsd.edu/mailman/listinfo/r-lang
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pidgin.ucsd.edu/pipermail/r-lang/attachments/20090411/89fe84c8/attachment.htm>


More information about the R-lang mailing list