[R-lang] residualization of a three-way contrast

Roger Levy rlevy at ling.ucsd.edu
Sat Apr 11 12:45:46 PDT 2009


On Apr 10, 2009, at 4:28 PM, Kyle Gorman wrote:

> i have three positively-correlated predictors that i'd like to  
> include in a model. any traditional measure suggests that to include  
> them as is would introduce a good deal of collinearity. really,  
> these are a great candidate for either taking the sum of the three,  
> or for PCA, but hypothetically, let's say i wanted to use a  
> residualization trick for this three-way interaction.
>
> (they are all on a 15 point scale and I predict they will all have  
> similar positive betas)
>
> X1 will remain as is.
>
> r.X2 = residuals(lm(X2 ~ X1))
> r.X3 = residuals(lm(X3 ~ X1 + r.X2)
>
> then:
>
> outcome ~ X1 + r.X2 + r.X3
>
> this is the solution i vaguely recall seeing in a textbook somewhere  
> under the name "partialization"

Hi Kyle,

> - is this kosher?

Yes, it's kosher, even during Passover :-) Just keep in mind what the  
outcome of your regression will be. The coefficient assigned to r.X3  
is "that portion of the variability in your outcome that cannot be  
expresssed as a linear combination of X1 and X2".  Likewise (more  
simply) for r.X2.

> - should the form of r.X3 be the naive residuals(lm(X3 ~ X1 + X2)?

It won't make a a difference. r.X3 will be the same in either case  
(modulo numerical error).

> - should the form of r.X2 be the less-naive residuals(lm(X2 ~ X1 +  
> X3))?

That would be bad.  If you did this and then used your original formula

   outcome ~ X1 + r.X2 + r.X3

you would be in a more restricted subspace than for

   outcome ~ X1 + X2 + X3

which you don't want. Imagine the extreme case where X2 == X3 always.   
Then with your proposal, r.X2 and r.X3 would always both be 0.

Roger


--

Roger Levy                      Email: rlevy at ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy










More information about the R-lang mailing list