[R-lang] How to compare mixed logit models with crossed random effects

Roger Levy rlevy at ling.ucsd.edu
Sun May 24 10:52:48 PDT 2009

Dear Linda,

On May 20, 2009, at 3:34 AM, Linda Mortensen wrote:

> Dear LanguageR users,
> I'm trying to fit a mixed logit model using the lmer function in the  
> lme4 package. My question concerns the random effects part of this  
> model (i.e., the random effects for my subjects and items) and how I  
> decide between models that differ in the number of random effect  
> terms that are estimated.

First of all, in my assessment the problem of which random effects  
terms to include in your model when the primary target of inference is  
the fixed effects is still open.

> So far, I have used two procedures:
> 1. For a given model, I remove a random effect term if it correlates  
> very strongly with either the intercept or any of the other random  
> effect terms. Eventually, I end up with a model in which all  
> correlations are modest.

This is an interesting idea, but I would emphasize two things:

1) it's important to distinguish between positive and negative  
correlations.  A strong negative correlation is telling you something  
very important about your dataset.  Imagine a word recognition task  
where the response variable is correct answer and the covariate x1 is  
word frequency.  A strong negative correlation between intercept and  
x1 is telling you that participants who answer more correctly overall  
are less sensitive to word frequency, and vice versa, and that this is  
a very reliable generalization.  You can see this in model log- 
likelihoods too: compare the two lmer model fits below.

k <- 10
n <- 1000
cl <- gl(k,1,n)
x1 <- runif(n)
sigma <- matrix(c(1,-1.8,-1.8,4),2,2)
b <- rmvnorm(k,mean=c(0,0),sigma)
eta <- b[cl,1] + b[cl,2]*x1
y <- rbinom(n,1,exp(eta)/ (1+exp(eta)))
lmer(y ~ 1 + (1 | cl),family="binomial")
lmer(y ~ 1 + (x1 | cl),family="binomial")

2) When you say "remove a term", what really would be justified is if  
the random parameters for covariates x1 and x2 are correlated at  
 >0.99, create a third, "proxy" parameter x12=x1+x2, add x12 to the  
random-effects structure, and drop x1 and x2.  This would save you two  
parameters at basically no modeling cost.

> 2. I compare the quasi-log likelihood (logLik) values of a model  
> with a given random effect term (e.g. an interaction term, ... (1 +  
> a * b | sub) and of a model without that term (... (1 + a + b |  
> sub). If the logLik values are very similar (i.e., if the value is  
> not, or at least not much, smaller for the model without the term  
> than for the model with the term), I go for the former model.

This is OK, and more of the recommended practice (see Baayen et al.,  
2008, for discussion with respect to linear mixed-effect models).  You  
can actually do a likelihood-ratio test, though with the dual caveats  
that (a) Laplace-approximated log-likelihood is not true  
loglikelihood; and (b) the test is conservative.

> Is it acceptable to select a model on the basis of this comparison?  
> Or, when the logLik values are similar (which they usually are for  
> my models), should I instead look at the measures of likelihood that  
> take into account the number of parameters in a model when  
> evaluating its fit (i.e., AIC, BIC, deviance)? According to these  
> other measures, a simple model seems always to be better than a more  
> complex one, but if I want to rule out that my fixed effects can be  
> explained, in part, by random effects for subjects and items, then a  
> simple model (with few random effects) is not necessarily better  
> than a complex one, I would think.

Well, first of all the deviance is just -2*logLik.  The AIC and BIC  
are still dominated by log-likelihood too. And it's not always going  
to be the case that the logLik will not be appreciably better for more  
complex models -- see my above example.  Finally, I'd agree with you  
that it's better to be cautious and include the extra, more complex  
terms if you want to be sure that you have a "real" fixed effect.

Hope this helps.




Roger Levy                      Email: rlevy at ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy

More information about the R-lang mailing list