[R-lang] Re: What happens if I include a continuous variable?

Thu Sep 16 11:37:37 PDT 2010

Hi Roger, Garry, David, Nathaniel,

you might find the slides useful that Victor Kuperman and I prepared for
WOMM (2008 or 2009?). An updated version that I used for a mini workshop at
McGill this year is available here:
http://hlplab.wordpress.com/2010/05/10/mini-womm-montreal-slides-now-available/

<http://hlplab.wordpress.com/2010/05/10/mini-womm-montreal-slides-now-available/>These
slides talk (lecture 2) about how to test for and how to remove collinearity
from your model. One way is residualization (which is what David meant when
he said model comparison, as Nathanial pointed out). It's also correct that
collinearity just causes loss in power, though -as shown in the slides I
mentioned above- it can be the case that the collinearity between two
predictors causes a spurious significance, because of secondary correlations
with variables that are not in the model (that's always a danger though).
The slides also contain small simulations that help you to understand the
increase in Type II error (power) and that there is no increase in Type I
error.

The good thing is that model comparison (described by Garry) is robust
against collinearity. So, the save and easy first step is indeed:

1) run the full model (with interaction of frequency and length)
2) run the model w/o frequency (and w/o interaction, of course)
3) run the model w/o length (and w/o interaction, of course)
4) compare 1 against 2, using anova() function: if significant, frequency
matters beyond length
5) compare 1 against 3: if significant: length matters beyond frequency

of course, both or neither of 4) and 5) may yield significance.

6) only if model comparison returns significance: figure whether the
direction of the effect is expected (using residualization, as described in
the slides).

HTH and sorry for the redundancy.

Florian

On Thu, Sep 16, 2010 at 5:34 PM, Nathaniel Smith <njs@pobox.com> wrote:

> Hi David,
>
> A few added comments to your reply.
>
> On Thu, Sep 16, 2010 at 5:17 AM, David Reitter <reitter@cmu.edu> wrote:
> > Hi Roger,
> >
> > On Sep 16, 2010, at 6:54 AM, Roger van-Gompel wrote:
> >> In other words, if I still get an effect of frequency in the second
> >> model, does this mean that this effect is not due to/not confounded with
> >> length?
> >
> > Yes, that's a good start, but have a look at the correlation matrix.
> If, after centering, you still find substantial correlations (perhaps >0.2)
> between frequency or length and their interaction, you will want to do
> something about it.  Otherwise, significance tests of the fitted effects
> won't tell you much, as these variables are confounded.  See also:
>
> This is true in general, but I think it's worth emphasizing that
> significance tests are still meaningful in the presence of
> collinearity, they just have less power. So, collinearity tends to
> mean you need more data in order to find significant results. But if
> you have enough data, this isn't an issue.
>
> (And if you don't have enough data, then there may be nothing you can
> do except collect more/better data -- or give up on trying to
> determine which of the correlated predictors is actually driving the
> effect.)
>
> > One way to address collinearity is to regress out length from frequency
> first, e.g. creating testdata$FreqWithoutLen from something like
> resid(lm(Freq ~ Len)).
> >
> > I wonder if it would be OK to do stepwise regression, i.e. regressing out
> length from your response variable (lexdectime) first, and then fitting the
> main model.
>
> I think you're talking about residualization, not stepwise regression?
> I would either regress out Length from *both* Freq and lexdectime, or
> not regress it out at all.
>
> > Similarly, I would think that finding significant changes (improvements)
> in fit when the Frequency variable is included would demonstrate the
> significance of its effect (use an ANOVA to compare the models, or just
> apply R's ANOVA to one model, which will stepwise compare the nested
> models).  Opinions?
>
> Makes sense to me. As mentioned above, the presence of Length will
> tend to decrease your power to detect a Freq effect, but that's
> inherent in the problem of distinguishing correlated predictors.
>
> -- Nathaniel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20100916/7088e408/attachment-0001.html