[R-lang] Generalized linear mixed models

Wed Jun 6 16:06:14 PDT 2007

Kathryn Campbell-Kibler wrote:
> Hi all,
> 
> I've recently been exploring beyond my established comfort zone with
> mixed models, and am looking for some correction or reassurance.  I am
> working with experimental data on social perceptions of linguistic
> variation.  I've got two types of dependent variables: ratings on a 6
> point scale (e.g. not at all intelligent-very intelligent), which I've
> been treating as linear variables and binary variables, based on
> whether a given term was selected as a good description of a speaker
> (e.g. hardworking).
> 
> The independent variables (well, some of them) were:
> speaker (8)
> recording (nested, 4 for each speaker) -- which recording was being responded to
> (ING) (3) -- crossed with recording, indicates which guise of the
> variable (ING) was used (e.g. working or workin')
> two measures of listener mood pleasant and arousal
> 
> The structure of the experiment was such that every subject heard one
> recording (which represented also one (ING) guise) from each speaker.
> 
> In the past with similar data, I have been using nlme for linear mixed
> models, and using subject id as a random effect. (ING) effects and the
> interaction of (ING) with the other variables, such as speaker, is the
> main point of interest. I have two questions.
> 
> 1) Is it more appropriate to build in both subject id and the
> recording choice as random effects, rather than only including just
> the subject id?  

Hi Kathryn,

Good question.  I think this is an empirical issue: including random 
effects of recording choice could cost you a little inferential power 
due to the requirement of estimating the variance and correlations of 
the random effects, but it could also improve your model.  At least for 
your linear mixed-effects model, you can treat it as an empirical issue: 
you can use a likelihood ratio test to compare models with and without 
random effects of recording choice, and if including the random effect 
doesn't lead to a significant improvement in the model, you could be 
justified in dropping it.

There's a nice discussion of this issue in the following paper:

Baayen, R.H., Davidson, D.J. and Bates, D.M. (submitted). Mixed-effects 
modeling with crossed random effects for subjects and items.

 > I am treating speakers as fixed effects,
 > deliberately-- I have no expectation that these particular speakers
 > are representative of anyone except themselves.  But the recordings
 > within each speaker were randomly assigned to listeners.

Just a note on this -- choosing to treat speaker as a random effect 
doesn't really commit you strongly to the particular speaker being 
representative of the population at large.  Rather, it says that the 
effects individual speakers have on the outcome of the response 
variables of interest come from a normal distribution, and that the 
effects of your individual speakers come from a random sampling of that 
distribution.  But any given speaker could well be an outlier within 
this distribution.

This might simply be an issue of how you worded your point.  If you 
chose your speakers with the idea that some of them might inherently be 
perceived as more or less intelligent, that would certainly justify 
treating speaker as a fixed effect.

> 2) When doing an analysis of the binary variables, how can I tell
> whether overdispersion and/or zero-inflation is an issue for me?

I'm not so familiar with the issue of zero-inflation but I thought that 
was a concern for count data rather than binomial data, no?

With respect to overdispersion: you can only talk about overdispersion 
in binomial data with respect to a potential variable that you think 
might have some effect on the outcomes.  When you have such a variable 
in mind, you can look at whether there are large differences in the 
proportion of positive outcomes for different values of that variable.

Finding that there are indeed large differences could justify adding the 
variable to your model as a random (or fixed) effect.

> 
> Bringing these two questions together, I have been looking at using
> lmer for both the "linear" and the binary variables, with something
> like these:
> 
> lmer(intellect~speaker*ining*(pleasant_mood+mood_arousal)+(1|subject_id)+(1|recording),
> data=whitenoise)
> 
> lmer(hardworking~speaker*ining*(pleasant_mood+mood_arousal)+(1|subject_id)+(1|recording),
>  family = binomial, data=whitenoise, method="AGQ")
> 
> Does this make sense, do I need the "recording" term?  

This looks right -- you also might consider adding in random interaction 
terms between subject/recording with your fixed-effect variables.

Out of curiosity, why are you looking at all possible interactions 
between fixed-effect variables except for those between pleasant_mood 
and mood_arousal?

> And how can I
> determine if I need to be concerned about zero-inflation and if so, is
> glmmADMB my only option for the binary variables (a pain, since I
> mostly use Macs)?

Not sure -- see my above question about zero-inflation!

Best

Roger

-- 

Roger Levy                      Email: rlevy at ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy