[R-lang] poly() and polynomials

Thu Sep 4 11:22:56 PDT 2008

Hi Roger,

> So: as long as you're using maximum likelihood, the precise polynomial
> basis functions you choose (=orthogonal or non-orthogonal) won't affect
> what model is fitted, though as you point out it will affect the
> correlations between the parameter estimates.  If you use any other
> model-fitting criterion, though, such as penalized ML or Bayesian
> methods, the basis functions will matter.  For lmer(), using REML to fit
> a model will also cause the basis functions chosen to affect the
> resulting model fit, because REML is effectively Bayesian model fitting
> with a uniform prior over the fixed effects (see Pinheiro & Bates 2000,
> pp. 75-76).
>
> As an example, you get different log-likelihoods from the orthogonal &
> raw bases in the following simulation:
>
> x <- runif(1000)
> subj <- rep(1:10,100)
> r <- rnorm(10)
> dat <- data.frame(y=2 * x + 0.5 * x * x + r[subj] +
> rnorm(1000,0,5),x=x,subj=subj)
> lmer(y ~ poly(x,2) + (1 | subj), dat)
> lmer(y ~ poly(x,2,raw=T) + (1 | subj), dat)

OK, thanks for pointing that out!  However, I don't think it applies
in this case because according to the documentation for lmer(),
logistic models (which is what I'm using) are always fit using
maximum-likelihood.  (I verified this by adding method = "REML" to my
model specification, and nothing changed.)

> > 2.  I always hear people say that if you have both a linear and
> > quadratic term for a particular predictor, you shouldn't get rid of
> > the linear term (even if it shows up as not significant) while
> > retaining the quadratic term.  In fact, if you were using poly(), it
> > wouldn't even be possible to do that.  But suppose you instead
> > precompute all the quadratic terms and treat them as separate
> > variables.  It seems to me that retaining a quadratic term for
> > variable x while deleting the linear term is essentially just
> > performing a transformation on the variable, no different from taking
> > the log of x as the predictor, which we do all the time for linguistic
> > variables.  So is there really any reason not to do it?
>
> Gee, that's an interesting question.  My take would be that it's
> analogous to the question of when you'd want to remove an intercept from
> a regression model.  If you have a strong theoretical reason why the
> intercept must be 0 (e.g., modeling distance traveled as a function of
> time), then your data should be fit well with a 0-intercept model, and
> you have good reason to remove the term from your regression.  But
> without a good theoretical basis for removing the intercept, you
> wouldn't want to take it out of the model even if it is "insignificantly
> different from zero".  Likewise with the lower-order terms in a
> polynomial regression.

That seems reasonable.

> FWIW, though, spline-based methods usually tend to behave better than
> polynomial regression, especially if you're doing even mild extrapolation.

Yes, it is certainly true that the quadratic fits I have make fairly
extreme predictions for points whose values are slightly beyond those
observed in the data.  Also using splines does seem to give a slightly
better fit.  However, there are a couple of reasons I've been using
polynomials rather than splines:

1. I want to be able to produce plots showing the predicted effects of
each fixed effect on the result variable.  I've been doing this by
extracting the fixed effect coefficients from the fitted model and
then plotting them (as in this figure:
http://homepages.inf.ed.ac.uk/sgwater/tmp/sri_num_coeff.pdf
where the result variable is
whether a word was correctly recognized by a speech recognition
system.  Variables are all centered and rescaled.)  Is there some way
to do this type of plot if I fit using rcs()?  I don't know what the
formula would be to make the plot using the rcs coefficients, and I
don't know of any predict-type function for lmer...

2.  The fit of this model to the data is fairly poor -- there is
simply a large amount of randomness that isn't captured.  So
prediction would be really bad anyway.  Mainly what I'm trying to do
is (as implied by point 1) figure out how the factors that I'm
examining influence the results.  So I'm more interested in
qualitative statements like "words with extreme values of certain
prosodic features, such as duration, jitter, and speech rate, are more
likely to be misrecognized than words with more moderate values" than
in worrying about exactly what how much worse it is to have a duration
of 1 sec than .5 sec. I would be happy to use rcs() if I could figure
out how to make the plots that allow me to make the qualitative
statements easy to see.  Any thoughts?

One more question, mostly unrelated: if I am using collin.fnc to
compute the condition number, am I correct that I should include only
the fixed effects when checking for collinearity? I've got a condition
number of around 18, which I think is borderline but OK, when I
include all the fixed effects including the (non-orthogonal) quadratic
terms.  But it goes up to 29 if I include the speaker and word factors
of my data (which I'm modeling as random).  That's because I also have
a factor for corpus, which is predictable given speaker.  But I guess
that high condition number is based on the assumption that you're
going to try to fit separate coefficients for each speaker, which in
fact I'm not.  So I should exclude the random effects from the
computation, right?

-- AuH2O

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.