[R-lang] High collinearity

T. Florian Jaeger tiflo at csli.stanford.edu
Wed Apr 29 06:30:13 PDT 2009


Hi Marco,

yes, you can use residuals as predictors even if the residuals are derived
from models with collinear multiple predictors. The fitted values are not
affected by collinearity (and hence neither the residuals). Only the
SE(betas) are biases and the betas themselves become hard to interpret.

With regard to your other question: if you residualize a predictor xi in
several different ways by regressing it against different combinations of
other predictors x1 ... xk, leading to different residualized versions of
xi, say r_xi1 to r_xik, and only one (or some) of these residualized
predictors results in significance, then you have to be careful in the
interpretation of the effect. You may find Victor Kuperman and my slides at
http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/useful
(see residualization) where we talk about the interpretation of a
residualized variable.

*Just to be clear, all predictors another predictor is residualized against
should be in the final model.* Although model comparison do not quite always
have the same result, significance of a residualized predictor r_xi in the
SE(beta)-based test (in the absence of remaining collinearity) is
essentially saying that the *un*residualized (i.e. original) predictor xi
improves the model significantly *beyond the predictors that xi was
residualized against.* So significance tests over different residualized
r_xi (xi residualized against different sets of other predictors x1 ... xk)
are actually testing different hypotheses?

Not sure, this is clear from what I am saying. Let me know,

Florian

On Wed, Apr 29, 2009 at 9:05 AM, Marco <dutch.linguistics at gmail.com> wrote:

> Dear Florian,
>
> Thank you for you comments. I have more than two correlated variables,
> though. Is it possible to use the residuals of models that contain multiple
> correlated variables? For as far as I know, the residuals are not affected
> by the collinearity; only the beta estimates for the individual variables in
> the model, right? Is the equation below statistically alright?
>
> residuals_end <- lm( End ~ correlatedvar1 + correlatedvar2 +
> correlatedvar3)
>
> I have run similar models for the other variables (i.e. correlatedvar1,
> correlatedvar2, etc.). Subsequently, I fitted similar models for the other
> variables. If only one of these residualised variables shows up as
> significant, does that prove its additional value? Or should I only
> residualise for variables one by one?
>
> Thanks in advance,
>
> Marco
>
>
>
>
>
> 2009/4/11 T. Florian Jaeger <tiflo at csli.stanford.edu>
>
>
>>
>> On Sat, Apr 11, 2009 at 8:55 AM, Marco <dutch.linguistics at gmail.com>wrote:
>>
>>> Dear R-langs,
>>>
>>> I have a data set that contains highly correlated variables (> .90), all
>>> of which are variables that occur on the same time scale. I crucially want
>>> to determine whether one of these variables (End) has explanatory power on
>>> top of all the other ones. In this case, is it legitimate to take the
>>> residuals of End (fitting an lm model, in which we explain End with all
>>> other, correlated variables), and then running an lmer model that only
>>> contains resid_end? When I look at the results I obtain, it seems like the
>>> other correlated variables result in corrupted residuals for End. Are there
>>> any other methods to deal with (and distinguish between) highly correlated
>>> variables in R?  Or could you tell me whether it is valid to use these
>>> residuals (and the F values obtained for these residuals), even though the
>>> beta coefficients are uninterpretable?
>>
>>
>> I think in your situation, you should first do some model comparison,
>> preferably bootstrapping over it (i.e. test e.g. 10,000 times which of the
>> two predictors would be removed from the model if you sampled from your data
>> randomly with replacement). that's the best thing to do if you have such
>> high correlations.
>>
>> residuals can be used, but you would have to residualize both ways and
>> test the two resulting models.
>>
>> florian
>>
>>>
>>>
>>> Thanks in advance!
>>>
>>> Marco
>>>
>>> _______________________________________________
>>> R-lang mailing list
>>> R-lang at ling.ucsd.edu
>>> http://pidgin.ucsd.edu/mailman/listinfo/r-lang
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pidgin.ucsd.edu/pipermail/r-lang/attachments/20090429/13cab941/attachment.htm>


More information about the R-lang mailing list