[R-lang] High collinearity

T. Florian Jaeger tiflo at csli.stanford.edu
Wed Apr 29 10:11:40 PDT 2009


On Wed, Apr 29, 2009 at 10:15 AM, Marco <dutch.linguistics at gmail.com> wrote:

> Dear Florian,
>
> Thank you for your response. I think my previous message was not entirely
> clear. Please let me try once more:
>
> Let's say that we have predictor 1, 2, 3, and 4, all of which are highly
> correlated. I want to investigate which of these correlated variables have
> explanatory power on top of all the other variables when explaining the
> dependent variable Dep. In order to (try to) avoid multicollinearity issues,
> I fitted four models:
>
> resid_1 <- lm(1~2+3+4)
> resid_2 <- lm(2~1+3+4)
> resid_3 <- lm(3~2+1+4)
> resid_4 <- lm(4~2+3+1)
>
> Subsequently, I fitted four linear models, one for each resid-variable I
> created, thus:
>
> 1.lm <- lm(dep~resid_1)
> 2.lm <- lm(dep~resid_2)
> 3.lm <- lm(dep~resid_3)
> 4.lm <- lm(dep~resid_4)
>
> I then corrected the p-values for the multiple comparisons. Is this a
> reasonable way to test the unique contributions of these four variables?


Hi Marco,

why would you do that rather than model comparison? If you want to assess
the direction of an effect, so that model comparison is insufficient, you
still have to be careful with your method since an effect (even of a
residual) may only surface after other partial correlations are being
accounted for. I guess that would correspond to

1.lm <- lm(residuals(lm(dep~ 2 + 3 + 4))~resid_1)
etc.

not that I am suggesting that!!! I would do:

1.lm <- lm(dep~resid_1 +  2 + 3 + 4)
2.lm <- lm(dep~resid_2 + 1 + 3 + 4)
etc.

that's the way to go.

Florian

>
>
> Thanks in advance,
>
> Marco
>
>
> 2009/4/29 T. Florian Jaeger <tiflo at csli.stanford.edu>
>
> Hi Marco,
>>
>> yes, you can use residuals as predictors even if the residuals are derived
>> from models with collinear multiple predictors. The fitted values are not
>> affected by collinearity (and hence neither the residuals). Only the
>> SE(betas) are biases and the betas themselves become hard to interpret.
>>
>> With regard to your other question: if you residualize a predictor xi in
>> several different ways by regressing it against different combinations of
>> other predictors x1 ... xk, leading to different residualized versions of
>> xi, say r_xi1 to r_xik, and only one (or some) of these residualized
>> predictors results in significance, then you have to be careful in the
>> interpretation of the effect. You may find Victor Kuperman and my slides at
>> http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/useful (see residualization) where we talk about the interpretation of a
>> residualized variable.
>>
>> *Just to be clear, all predictors another predictor is residualized
>> against should be in the final model.* Although model comparison do not
>> quite always have the same result, significance of a residualized predictor
>> r_xi in the SE(beta)-based test (in the absence of remaining collinearity)
>> is essentially saying that the *un*residualized (i.e. original) predictor
>> xi improves the model significantly *beyond the predictors that xi was
>> residualized against.* So significance tests over different residualized
>> r_xi (xi residualized against different sets of other predictors x1 ... xk)
>> are actually testing different hypotheses?
>>
>> Not sure, this is clear from what I am saying. Let me know,
>>
>> Florian
>>
>>
>> On Wed, Apr 29, 2009 at 9:05 AM, Marco <dutch.linguistics at gmail.com>wrote:
>>
>>> Dear Florian,
>>>
>>> Thank you for you comments. I have more than two correlated variables,
>>> though. Is it possible to use the residuals of models that contain multiple
>>> correlated variables? For as far as I know, the residuals are not affected
>>> by the collinearity; only the beta estimates for the individual variables in
>>> the model, right? Is the equation below statistically alright?
>>>
>>> residuals_end <- lm( End ~ correlatedvar1 + correlatedvar2 +
>>> correlatedvar3)
>>>
>>> I have run similar models for the other variables (i.e. correlatedvar1,
>>> correlatedvar2, etc.). Subsequently, I fitted similar models for the other
>>> variables. If only one of these residualised variables shows up as
>>> significant, does that prove its additional value? Or should I only
>>> residualise for variables one by one?
>>>
>>> Thanks in advance,
>>>
>>> Marco
>>>
>>>
>>>
>>>
>>>
>>> 2009/4/11 T. Florian Jaeger <tiflo at csli.stanford.edu>
>>>
>>>
>>>>
>>>> On Sat, Apr 11, 2009 at 8:55 AM, Marco <dutch.linguistics at gmail.com>wrote:
>>>>
>>>>> Dear R-langs,
>>>>>
>>>>> I have a data set that contains highly correlated variables (> .90),
>>>>> all of which are variables that occur on the same time scale. I crucially
>>>>> want to determine whether one of these variables (End) has explanatory power
>>>>> on top of all the other ones. In this case, is it legitimate to take the
>>>>> residuals of End (fitting an lm model, in which we explain End with all
>>>>> other, correlated variables), and then running an lmer model that only
>>>>> contains resid_end? When I look at the results I obtain, it seems like the
>>>>> other correlated variables result in corrupted residuals for End. Are there
>>>>> any other methods to deal with (and distinguish between) highly correlated
>>>>> variables in R?  Or could you tell me whether it is valid to use these
>>>>> residuals (and the F values obtained for these residuals), even though the
>>>>> beta coefficients are uninterpretable?
>>>>
>>>>
>>>> I think in your situation, you should first do some model comparison,
>>>> preferably bootstrapping over it (i.e. test e.g. 10,000 times which of the
>>>> two predictors would be removed from the model if you sampled from your data
>>>> randomly with replacement). that's the best thing to do if you have such
>>>> high correlations.
>>>>
>>>> residuals can be used, but you would have to residualize both ways and
>>>> test the two resulting models.
>>>>
>>>> florian
>>>>
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> Marco
>>>>>
>>>>> _______________________________________________
>>>>> R-lang mailing list
>>>>> R-lang at ling.ucsd.edu
>>>>> http://pidgin.ucsd.edu/mailman/listinfo/r-lang
>>>>>
>>>>>
>>>>
>>
>
>
> --
> <img src="http://members.home.nl/f-venhemmen/MER/dutch.linguistics3.png>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pidgin.ucsd.edu/pipermail/r-lang/attachments/20090429/2be4639b/attachment-0001.htm>


More information about the R-lang mailing list