[R-lang] Re: What happens if I include a continuous variable?

Nathaniel Smith njs@pobox.com
Thu Sep 16 08:34:50 PDT 2010


Hi David,

A few added comments to your reply.

On Thu, Sep 16, 2010 at 5:17 AM, David Reitter <reitter@cmu.edu> wrote:
> Hi Roger,
>
> On Sep 16, 2010, at 6:54 AM, Roger van-Gompel wrote:
>> In other words, if I still get an effect of frequency in the second
>> model, does this mean that this effect is not due to/not confounded with
>> length?
>
> Yes, that's a good start, but have a look at the correlation matrix.   If, after centering, you still find substantial correlations (perhaps >0.2) between frequency or length and their interaction, you will want to do something about it.  Otherwise, significance tests of the fitted effects won't tell you much, as these variables are confounded.  See also:

This is true in general, but I think it's worth emphasizing that
significance tests are still meaningful in the presence of
collinearity, they just have less power. So, collinearity tends to
mean you need more data in order to find significant results. But if
you have enough data, this isn't an issue.

(And if you don't have enough data, then there may be nothing you can
do except collect more/better data -- or give up on trying to
determine which of the correlated predictors is actually driving the
effect.)

> One way to address collinearity is to regress out length from frequency first, e.g. creating testdata$FreqWithoutLen from something like resid(lm(Freq ~ Len)).
>
> I wonder if it would be OK to do stepwise regression, i.e. regressing out length from your response variable (lexdectime) first, and then fitting the main model.

I think you're talking about residualization, not stepwise regression?
I would either regress out Length from *both* Freq and lexdectime, or
not regress it out at all.

> Similarly, I would think that finding significant changes (improvements) in fit when the Frequency variable is included would demonstrate the significance of its effect (use an ANOVA to compare the models, or just apply R's ANOVA to one model, which will stepwise compare the nested models).  Opinions?

Makes sense to me. As mentioned above, the presence of Length will
tend to decrease your power to detect a Freq effect, but that's
inherent in the problem of distinguishing correlated predictors.

-- Nathaniel



More information about the ling-r-lang-L mailing list