[R-lang] Generalized linear mixed models
rlevy at ucsd.edu
Thu Jun 7 13:31:46 PDT 2007
Kathryn Campbell-Kibler wrote:
> Hi Roger!
>> Good question. I think this is an empirical issue: including random
>> effects of recording choice could cost you a little inferential power
>> due to the requirement of estimating the variance and correlations of
>> the random effects, but it could also improve your model. At least for
>> your linear mixed-effects model, you can treat it as an empirical issue:
>> you can use a likelihood ratio test to compare models with and without
>> random effects of recording choice, and if including the random effect
>> doesn't lead to a significant improvement in the model, you could be
>> justified in dropping it.
> That makes sense, and thanks for the reference, I will look at it.
> Off-list, the issue was raised that since recording predicts speaker
> (i.e. if the recording is about volleyball, you know it's Bonnie) that
> there is a problem including both in the model. Are nested variables
> like this a problem? (Unless that's explained in the paper, at which
> point I will hopefully soon know the answer!)
Good question! The answer is that for random effects, nesting variables
is not problematic -- in fact, this kind of nesting is one of the reason
that the terms "mixed-effects model" and "multilevel" are often used
interchangeably. The way to think about this is that you have the
following nesting structure:
speaker -> recording -> expected_mean_observation
If recording were a fixed effect, then it would indeed be the case that
when you estimate parameters for recording, it would wipe out any role
of parameter estimates for speaker (whether speaker is a random or fixed
effect). However, since recording is a random effect, you are
estimating only its variance rather than its actual parameter estimates,
and the combination of speaker+recording can be thought of as the
mean+variance of the normally distributed overall effect on your
>> One other point I neglected to mention. Technically it is not really
> correct to treat data on a 6-point scale with a linear model, because
> the error in your data cannot be normally distributed. This problem
> will probably be worst in cases where the predicted response rate is
> close to the extreme values, where the distribution is likely to be
> skewed. Ordinal regression would probably be the most natural approach,
> but the bad news is that I believe there is no current means within R to
> include mixed effects in an ordinal regression model.
> You're right, of course. I've been avoiding this issue, on the
> grounds that everyone in the field treats these responses as linear.
> But I shouldn't. Do you have any ideas? My sense is that dealing
> with random effects is very important, likely more so than the ordinal
> issues, but that may be just because that's what I know more about.
> Do you (or anyone else) know of another platform that could cope with
> ordinal and mixed effects?
It could be that I'm wrong and R does have a facility for mixed-effects
modeling with ordinal regression -- anyone know? From the following
page it looks like SAS might be able to do it:
Otherwise...within R, you might do the analysis both (1) with ordinal
regression and without random effects, and (2) with linear regression
and with random effects, and see what the differences are. Also, the
results from a linear regression are less likely to be horribly wrong if
the response means are never near the extremes of the scale. You might
plot a histogram of the responses for various combinations of your fixed
effects and see whether they're roughly normally distributed.
> Thanks for all the advice!
Sure -- hope it's useful!
Roger Levy Email: rlevy at ucsd.edu
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://ling.ucsd.edu/~rlevy
More information about the R-lang