[R-lang] residualization of a three-way contrast

Sat Apr 11 12:47:35 PDT 2009

On Fri, Apr 10, 2009 at 4:28 PM, Kyle Gorman <kylebgorman at gmail.com> wrote:
> X1 will remain as is.
>
> r.X2 = residuals(lm(X2 ~ X1))
> r.X3 = residuals(lm(X3 ~ X1 + r.X2)
>
> then:
>
> outcome ~ X1 + r.X2 + r.X3
>
> this is the solution i vaguely recall seeing in a textbook somewhere under
> the name "partialization"
> - is this kosher?

Sure.
  r.X2 == X2 - alpha - beta*X1 (for some alpha and beta)
Which means:
  X2 == r.X2 + alpha + beta*X1
and a similar rule is true of r.X3. That means that if the linear
model with residuals wants to use, say, X3 to predict the outcome,
then it can reconstruct it from X1, r.X2, r.X3 by choosing the right
linear coefficients. In other words,
  outcome ~ X1 + r.X2 + r.X3
and
  outcome ~ X1 + X2 + X3
end up fitting exactly the same linear models.

The only differences are in numerical stability (the model with
residuals is better), and that you have to interpret the fitted
coefficients differently (and the default t-tests as well, of course,
since those are testing the hypothesis that each coefficient is
non-zero). If you need other hypothesis tests, you can use
linear.hypothesis from library(car), which lets you test things like
"these coefficients are equal to each other", or "these coefficients
sum to 0".

It can help in interpretation to rescale X1 -- I often fit models like
  p.X1 <- predict(lm(X2 ~ X1))
  r.X2 <- resid(lm(X2 ~ X1))
  lm(outcome ~ p.X1 + r.X2)
p.X1 is just a rescaling/recentering of X1 to put it on the same scale
as X2. What's nice is that X2 is the simple sum of p.X1 + r.X2. That
means that if I see the same coefficients on p.X1 and r.X2, the model
is reconstructing X2, if both are non-zero but different, then the
model wants to use a mix, etc. I haven't really thought about how to
do this for more than 2 variables, but maybe it'll give you some
ideas.

> - should the form of r.X3 be the naive residuals(lm(X3 ~ X1 + X2)?

It makes no difference. As in, whichever way you calculate them, you
will get exactly the same values for r.X3 (except that lm(X3 ~ X1 +
X2) might be less numerically stable, as above).

> - should the form of r.X2 be the less-naive residuals(lm(X2 ~ X1 + X3))?

I wouldn't, since it breaks that logic that you're fitting "the same
linear model".

> ps: yes, i didn't say anything about language here. but it's a language
> study

Doesn't bother me! These issues are endemic in language studies, and
most stats books are completely unhelpful. ("Don't put correlated
predictors in!" is fine advice if your goal is just prediction, but
when the whole point of your study is to compare the two predictors,
well...)

-- Nathaniel