[R-lang] p-values for factors using lmer & mcmc

Mon Sep 3 08:00:37 PDT 2007

T. Florian Jaeger wrote:

> [...] in lmer mixed logit models are fitted using penalized 
> quasi-likelihood maximization, which in small non-technical words means 
> that when you compare those likelihoods (measures of model fit) for two 
> logit models (one with and one without a parameter/predictor of 
> interest) you could even end up finding that the bigger model is less 
> likely (which cannot happen with maximum likelihood fits). that makes 
> comparing two mixed logit models by means of the anova() function ( i.e. 
> by means of likelihood ratios) problematic. but mixed logit models 
> should have p-values for the coefficients (based on the wald statistic). 

Dunno much about the problems with penalized quasi-likelihood 
maximization, but I do keep reading that Wald tests should be avoided 
where possible, often Hauck and Donner (1977) cited as the reason.  I'd 
quite like to get to the bottom of this.  If I understand correctly (and 
please do correct me if I'm wrong!), the problem is that as the size of 
the effect increases, to begin with the Wald coefficient increases, but 
then at a particular point it begins to decrease again (i.e. it's 
nonmonotonic).  They illustrate this by reanalysing a dataset collected 
to try to discover what predicts the presence of the T. vaginalis 
organism in women.  All the predictors (they're all qualitative) were 
significant at the 0.05 level using likelihood ratio tests.  Using the 
Wald test, however, two were badly not significant (in case you're 
interested, one related to sexual experience and whether there was a 
history of gonorrhea).  This kind of inconsistency can "leave the user 
in a quandry," say Hauck and Donner.  Indeed!

> 
> as for your the fact that you're interested in entire factors rather 
> than parameters - why? if a factor with 4 levels is significant 
> according to model comparison, but none of the parameters/coefficients 
> associated with that factor reaches significance in the model that isn't 
> good anyway 

I'm also confused by this.  Faraway (2006, p. 13) mentions in passing 
that "We would normally avoid using the t-tests for the levels of 
qualitative predictors with more than two levels."  His example is for 
Gaussian multiple regression; perhaps that's important?

On the p-values for likelihood ratio-tests for random effects models he 
says (p. 158) that they tend to be too small.  He goes on to recommend 
parametric bootstrap methods.

Andy

@BOOK{Faraway2006,
   title = {Extending the Linear Model with R},
   publisher = {Chapman \& Hall/CRC},
   year = {2006},
   author = {Julian J. Faraway},
}

@ARTICLE{HauckDonner1977,
   author = {{Hauck, Walter W., Jr.} and {Donner, Allan}},
   title = {Wald's Test as Applied to Hypotheses in Logit Analysis},
   journal = {Journal of the American Statistical Association},
   year = {1977},
   volume = {72},
   pages = {851--853},
   number = {360},
}

-- 
Andy Fugard, Postgraduate Research Student
Psychology (Room F15), The University of Edinburgh,
   7 George Square, Edinburgh EH8 9JZ, UK
Mobile: +44 (0)78 123 87190   http://www.possibly.me.uk