[R-lang] Generalized linear mixed models
Roger Levy
rlevy at ucsd.edu
Wed Jun 6 16:06:14 PDT 2007
Kathryn Campbell-Kibler wrote:
> Hi all,
>
> I've recently been exploring beyond my established comfort zone with
> mixed models, and am looking for some correction or reassurance. I am
> working with experimental data on social perceptions of linguistic
> variation. I've got two types of dependent variables: ratings on a 6
> point scale (e.g. not at all intelligent-very intelligent), which I've
> been treating as linear variables and binary variables, based on
> whether a given term was selected as a good description of a speaker
> (e.g. hardworking).
>
> The independent variables (well, some of them) were:
> speaker (8)
> recording (nested, 4 for each speaker) -- which recording was being responded to
> (ING) (3) -- crossed with recording, indicates which guise of the
> variable (ING) was used (e.g. working or workin')
> two measures of listener mood pleasant and arousal
>
> The structure of the experiment was such that every subject heard one
> recording (which represented also one (ING) guise) from each speaker.
>
> In the past with similar data, I have been using nlme for linear mixed
> models, and using subject id as a random effect. (ING) effects and the
> interaction of (ING) with the other variables, such as speaker, is the
> main point of interest. I have two questions.
>
> 1) Is it more appropriate to build in both subject id and the
> recording choice as random effects, rather than only including just
> the subject id?
Hi Kathryn,
Good question. I think this is an empirical issue: including random
effects of recording choice could cost you a little inferential power
due to the requirement of estimating the variance and correlations of
the random effects, but it could also improve your model. At least for
your linear mixed-effects model, you can treat it as an empirical issue:
you can use a likelihood ratio test to compare models with and without
random effects of recording choice, and if including the random effect
doesn't lead to a significant improvement in the model, you could be
justified in dropping it.
There's a nice discussion of this issue in the following paper:
Baayen, R.H., Davidson, D.J. and Bates, D.M. (submitted). Mixed-effects
modeling with crossed random effects for subjects and items.
> I am treating speakers as fixed effects,
> deliberately-- I have no expectation that these particular speakers
> are representative of anyone except themselves. But the recordings
> within each speaker were randomly assigned to listeners.
Just a note on this -- choosing to treat speaker as a random effect
doesn't really commit you strongly to the particular speaker being
representative of the population at large. Rather, it says that the
effects individual speakers have on the outcome of the response
variables of interest come from a normal distribution, and that the
effects of your individual speakers come from a random sampling of that
distribution. But any given speaker could well be an outlier within
this distribution.
This might simply be an issue of how you worded your point. If you
chose your speakers with the idea that some of them might inherently be
perceived as more or less intelligent, that would certainly justify
treating speaker as a fixed effect.
> 2) When doing an analysis of the binary variables, how can I tell
> whether overdispersion and/or zero-inflation is an issue for me?
I'm not so familiar with the issue of zero-inflation but I thought that
was a concern for count data rather than binomial data, no?
With respect to overdispersion: you can only talk about overdispersion
in binomial data with respect to a potential variable that you think
might have some effect on the outcomes. When you have such a variable
in mind, you can look at whether there are large differences in the
proportion of positive outcomes for different values of that variable.
Finding that there are indeed large differences could justify adding the
variable to your model as a random (or fixed) effect.
>
> Bringing these two questions together, I have been looking at using
> lmer for both the "linear" and the binary variables, with something
> like these:
>
> lmer(intellect~speaker*ining*(pleasant_mood+mood_arousal)+(1|subject_id)+(1|recording),
> data=whitenoise)
>
> lmer(hardworking~speaker*ining*(pleasant_mood+mood_arousal)+(1|subject_id)+(1|recording),
> family = binomial, data=whitenoise, method="AGQ")
>
> Does this make sense, do I need the "recording" term?
This looks right -- you also might consider adding in random interaction
terms between subject/recording with your fixed-effect variables.
Out of curiosity, why are you looking at all possible interactions
between fixed-effect variables except for those between pleasant_mood
and mood_arousal?
> And how can I
> determine if I need to be concerned about zero-inflation and if so, is
> glmmADMB my only option for the binary variables (a pain, since I
> mostly use Macs)?
Not sure -- see my above question about zero-inflation!
Best
Roger
--
Roger Levy Email: rlevy at ucsd.edu
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://ling.ucsd.edu/~rlevy
More information about the R-lang
mailing list