[R-lang] poly() and polynomials

Wed Sep 3 17:51:42 PDT 2008

Hi Sharon,

Sharon Goldwater wrote:
> I'm trying to build a mixed logit model using lmer, and I have some
> questions about poly() and the use of quadratic terms in general.  My
> understanding is that, by default, poly() creates orthogonal
> polynomials, so the coefficients are not easily interpretable.  On the
> other hand, using m ~ poly(x, raw=T) should be equivalent to m ~ x +
> xsq, where xsq is precomputed as x^2.  I have verified the second
> statement by building one model using raw poly() and another one by
> precomputing all quadratic terms, and the coefficients are indeed
> virtually identical.  Now I have a couple of questions:
> 
> 1.  When I try to build a model using raw=F, I get the following error:
> 
>> m22 <- lmer(is_err ~ poly(pmean_sex,2)  +(1|speaker)+(1|ref), x=T,y=T,family="binomial")
> Error in x * raw^2 : non-conformable arrays
> 
> but
>>  m22 <- lmer(is_err ~ poly(pmean_sex,2,raw=T) +(1|speaker)+(1|ref), x=T,y=T,family="binomial")
> works fine.
> 
> Normally my model has far more predictors, but this is the only one
> that seems to cause the problem.  This doesn't seem to be a problem
> with lmer specifically, since it's reproducible using lrm and lm as
> well.  Does anyone know what this means?   

Well, the "non-conformable arrays" error means that you're trying to
multiply or add two arrays (e.g., two matrices) with different
dimensions.  But that doesn't give me any clue as to why this would
happen for you!  Sorry...

> I can't seem to find an
> answer in R help archives that makes any sense.  More generally,
> should I care about trying to fix it?  That is, is there a good reason
> to prefer orthogonal polynomials rather than raw ones, aside from
> reducing collinearity slightly? Since I would like to be able to
> interpret the coefficients (i.e., determine the relative importance of
> each variable as a predictor of error rates), I would tend towards
> using raw polynomials anyway.

So: as long as you're using maximum likelihood, the precise polynomial
basis functions you choose (=orthogonal or non-orthogonal) won't affect
what model is fitted, though as you point out it will affect the
correlations between the parameter estimates.  If you use any other
model-fitting criterion, though, such as penalized ML or Bayesian
methods, the basis functions will matter.  For lmer(), using REML to fit
a model will also cause the basis functions chosen to affect the
resulting model fit, because REML is effectively Bayesian model fitting
with a uniform prior over the fixed effects (see Pinheiro & Bates 2000,
pp. 75-76).

As an example, you get different log-likelihoods from the orthogonal &
raw bases in the following simulation:

x <- runif(1000)
subj <- rep(1:10,100)
r <- rnorm(10)
dat <- data.frame(y=2 * x + 0.5 * x * x + r[subj] +
rnorm(1000,0,5),x=x,subj=subj)
lmer(y ~ poly(x,2) + (1 | subj), dat)
lmer(y ~ poly(x,2,raw=T) + (1 | subj), dat)

> 
> 2.  I always hear people say that if you have both a linear and
> quadratic term for a particular predictor, you shouldn't get rid of
> the linear term (even if it shows up as not significant) while
> retaining the quadratic term.  In fact, if you were using poly(), it
> wouldn't even be possible to do that.  But suppose you instead
> precompute all the quadratic terms and treat them as separate
> variables.  It seems to me that retaining a quadratic term for
> variable x while deleting the linear term is essentially just
> performing a transformation on the variable, no different from taking
> the log of x as the predictor, which we do all the time for linguistic
> variables.  So is there really any reason not to do it?

Gee, that's an interesting question.  My take would be that it's
analogous to the question of when you'd want to remove an intercept from
a regression model.  If you have a strong theoretical reason why the
intercept must be 0 (e.g., modeling distance traveled as a function of
time), then your data should be fit well with a 0-intercept model, and
you have good reason to remove the term from your regression.  But
without a good theoretical basis for removing the intercept, you
wouldn't want to take it out of the model even if it is "insignificantly
different from zero".  Likewise with the lower-order terms in a
polynomial regression.

FWIW, though, spline-based methods usually tend to behave better than
polynomial regression, especially if you're doing even mild extrapolation.

Best

Roger

-- 

Roger Levy                      Email: rlevy at ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy