[R-lang] Generalized linear mixed models

Thu Jun 7 12:14:50 PDT 2007

Hi Roger!

> Good question.  I think this is an empirical issue: including random
> effects of recording choice could cost you a little inferential power
> due to the requirement of estimating the variance and correlations of
> the random effects, but it could also improve your model.  At least for
> your linear mixed-effects model, you can treat it as an empirical issue:
> you can use a likelihood ratio test to compare models with and without
> random effects of recording choice, and if including the random effect
> doesn't lead to a significant improvement in the model, you could be
> justified in dropping it.

That makes sense, and thanks for the reference, I will look at it.
Off-list, the issue was raised that since recording predicts speaker
(i.e. if the recording is about volleyball, you know it's Bonnie) that
there is a problem including both in the model. Are nested variables
like this a problem?  (Unless that's explained in the apper, at which
point I will hopefully soon know the answer!)

> Just a note on this -- choosing to treat speaker as a random effect
> doesn't really commit you strongly to the particular speaker being
> representative of the population at large.  Rather, it says that the
> effects individual speakers have on the outcome of the response
> variables of interest come from a normal distribution, and that the
> effects of your individual speakers come from a random sampling of that
> distribution.  But any given speaker could well be an outlier within
> this distribution.
>
> This might simply be an issue of how you worded your point.  If you
> chose your speakers with the idea that some of them might inherently be
> perceived as more or less intelligent, that would certainly justify
> treating speaker as a fixed effect.

I didn't hand-select them exactly for particular qualities, but
without getting too complicated here I think that as you say above,
they are a fixed effect in that sense.

> > 2) When doing an analysis of the binary variables, how can I tell
> > whether overdispersion and/or zero-inflation is an issue for me?
>
> I'm not so familiar with the issue of zero-inflation but I thought that
> was a concern for count data rather than binomial data, no?

I'm not so familiar either, which is why I'm not sure, but some of the
things I've found in the archives of the main R help lists suggests it
could be an issue.

> With respect to overdispersion: you can only talk about overdispersion
> in binomial data with respect to a potential variable that you think
> might have some effect on the outcomes.  When you have such a variable
> in mind, you can look at whether there are large differences in the
> proportion of positive outcomes for different values of that variable.
>
> Finding that there are indeed large differences could justify adding the
> variable to your model as a random (or fixed) effect.

Ok, I think this makes sense to me.

> This looks right -- you also might consider adding in random interaction
> terms between subject/recording with your fixed-effect variables.

Oh, right. That's a very good idea.

> Out of curiosity, why are you looking at all possible interactions
> between fixed-effect variables except for those between pleasant_mood
> and mood_arousal?

I'm suspicious of how they would interact, since they are calculated
from an overlapping set of measures.

And from your second mail:

>One other point I neglected to mention.  Technically it is not really
correct to treat data on a 6-point scale with a linear model, because
the error in your data cannot be normally distributed.  This problem
will probably be worst in cases where the predicted response rate is
close to the extreme values, where the distribution is likely to be
skewed.  Ordinal regression would probably be the most natural approach,
but the bad news is that I believe there is no current means within R to
include mixed effects in an ordinal regression model.

You're right, of course.  I've been avoiding this issue, on the
grounds that everyone in the field treats these responses as linear.
But I shouldn't.  Do you have any ideas?  My sense is that dealing
with random effects is very important, likely more so than the ordinal
issues, but that may be just because that's what I know more about.
Do you (or anyone else) know of another platform that could cope with
ordinal and mixed effects?

Thanks for all the advice!

Kathryn