[R-lang] Generalized linear mixed models

Thu Jun 7 13:31:46 PDT 2007

Kathryn Campbell-Kibler wrote:
> Hi Roger!
> 
>> Good question.  I think this is an empirical issue: including random
>> effects of recording choice could cost you a little inferential power
>> due to the requirement of estimating the variance and correlations of
>> the random effects, but it could also improve your model.  At least for
>> your linear mixed-effects model, you can treat it as an empirical issue:
>> you can use a likelihood ratio test to compare models with and without
>> random effects of recording choice, and if including the random effect
>> doesn't lead to a significant improvement in the model, you could be
>> justified in dropping it.
> 
> That makes sense, and thanks for the reference, I will look at it.
> Off-list, the issue was raised that since recording predicts speaker
> (i.e. if the recording is about volleyball, you know it's Bonnie) that
> there is a problem including both in the model. Are nested variables
> like this a problem?  (Unless that's explained in the paper, at which
> point I will hopefully soon know the answer!)

Hi Kathryn,

Good question!  The answer is that for random effects, nesting variables 
is not problematic -- in fact, this kind of nesting is one of the reason 
that the terms "mixed-effects model" and "multilevel" are often used 
interchangeably.  The way to think about this is that you have the 
following nesting structure:

   speaker -> recording -> expected_mean_observation

If recording were a fixed effect, then it would indeed be the case that 
when you estimate parameters for recording, it would wipe out any role 
of parameter estimates for speaker (whether speaker is a random or fixed 
effect).  However, since recording is a random effect, you are 
estimating only its variance rather than its actual parameter estimates, 
and the combination of speaker+recording can be thought of as the 
mean+variance of the normally distributed overall effect on your 
observations.

>> One other point I neglected to mention.  Technically it is not really
> correct to treat data on a 6-point scale with a linear model, because
> the error in your data cannot be normally distributed.  This problem
> will probably be worst in cases where the predicted response rate is
> close to the extreme values, where the distribution is likely to be
> skewed.  Ordinal regression would probably be the most natural approach,
> but the bad news is that I believe there is no current means within R to
> include mixed effects in an ordinal regression model.
> 
> You're right, of course.  I've been avoiding this issue, on the
> grounds that everyone in the field treats these responses as linear.
> But I shouldn't.  Do you have any ideas?  My sense is that dealing
> with random effects is very important, likely more so than the ordinal
> issues, but that may be just because that's what I know more about.
> Do you (or anyone else) know of another platform that could cope with
> ordinal and mixed effects?

It could be that I'm wrong and R does have a facility for mixed-effects 
modeling with ordinal regression -- anyone know?  From the following 
page it looks like SAS might be able to do it:

  http://tigger.uic.edu/~hedeker/long.html

Otherwise...within R, you might do the analysis both (1) with ordinal 
regression and without random effects, and (2) with linear regression 
and with random effects, and see what the differences are.  Also, the 
results from a linear regression are less likely to be horribly wrong if 
the response means are never near the extremes of the scale. You might 
plot a histogram of the responses for various combinations of your fixed 
effects and see whether they're roughly normally distributed.

> Thanks for all the advice!

Sure -- hope it's useful!

Best

Roger

-- 

Roger Levy                      Email: rlevy at ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy