[R-lang] Re: What happens if I include a continuous variable?

David Reitter reitter@cmu.edu
Thu Sep 16 12:00:15 PDT 2010


On Sep 16, 2010, at 11:34 AM, Nathaniel Smith wrote:
> 
>> One way to address collinearity is to regress out length from frequency first, e.g. creating testdata$FreqWithoutLen from something like resid(lm(Freq ~ Len)).
>> 
>> I wonder if it would be OK to do stepwise regression, i.e. regressing out length from your response variable (lexdectime) first, and then fitting the main model.
> 
> I think you're talking about residualization, not stepwise regression?
> I would either regress out Length from *both* Freq and lexdectime, or
> not regress it out at all.

In the first sentence I talk about residualization, in the second I'm asking if a different approach would also be feasible.

(1) In Florian's slides (I'm referring to the McGill lecture that he referenced, slides 43/44), we're taking the residuals from a model of the structure Freq ~ Len (applied to Roger's example).  These residuals are used as predictor in the original model, as lexdectime ~ residual  (lexdectime is presumable an RT).

(2) The alternative is not really stepwise regression in the sense that you'd add main effects, and then interactions etc., but perhaps something like this:

m1 <- lm(lexdectime ~ Len)
m2 <- lm(resid(m1) ~ Freq)
summary(m2)  

... if we are interested in the effect of Freq, after Len has been accounted for.
Copying from slide 45 in the above reference, one would still say "We have granted Len the entire portion of
the variance that cannot unambiguously attributed to either Freq or Len!".  Comments appreciated.


More information about the ling-r-lang-L mailing list