[R-lang] Comparing PQL fits for logistic regression models

T. Florian Jaeger tiflo at csli.stanford.edu
Wed Oct 10 14:30:01 PDT 2007


On 10/10/07, David Reitter <dreitter at inf.ed.ac.uk> wrote:
>
> What are the options to compare fits of logistic regression models?
>
> My models are fitted using 'glmmPQL' from the MASS and nlme
> libraries. The models to be compared differ in their use of
> covariates, e.g.,
>
> model.1 <- glmmPQL(dep ~ pred1a * pred2, random=1|subj, family=binomial)
> model.2 <- glmmPQL(dep ~ pred1b * pred2, random=1|subj, family=binomial)
>
> pred1a and pred1b are correlated, and I'd like to estimate their
> relative predictive power.


predictive power? you can, of course, use predict() [you have to write your
own version if you want to use the random effects]. and then use
cross-validation to get a measure for unseen data.

Both models use the same datasets, but the covariate structure is not
> nested.


you should create a superset model containing all factors from the subset
models and then compare the subset models against the superset model. that's
the case independent of whether the model is fit maximizing ML or
quasi-likelihoods (this methods makes sure you compare nested models).

I think some people also use the DIC (deviance information criterion) to
compare non-nested models. Shravan Vasishth was talking about this. Maybe
you can ask him? I have yet to be convinced that this is ok.

(Adjusted) R^2 are not available, which I think has to do with the
> PQL fitting algorithm (it is not a maximum likelihood fit - do I
> remember correctly?).


it's penalized quasi-likelihood, even though now you can also use Laplace
approximated quasi-likelihood. in either case, an analytic approximation of
the ML function is optimized since no analytic solution to the ML function
is known. (the other alternative is Monte Carlo simulations, but they become
unfeasible with as little as 3 parameters according to Agresti02).

AIC doesn't seem to be useful either ([1]). AIC, BIC and Log-
> Likelihood are only output as "NA" using the "summary" function. If
> there is no valid Log-Likelihood available for a PQL model, we can't
> compute Nagelkerke's (Pseudo) R^2, can we?


the quasi-likelihood is an approximation of the (penalized) ML. as such, it
can lead to weird comparisons (a smaller model may turn out to be more
"quasi-likely" than a larger model).

The only values I am getting for my model are adjusted R^2 for "lm"
> fits. But because I'm using repeated-measures data (sampled from
> dialogue corpora), fits here seem to violate not only distribution
> assumptions of the response variable, but also independence   (IID)
> of the data points.


You can at least use quasi-likelihood comparisons to argue your case (using
the same logic as for ML comparisons). have a look at the manuscript on
mixed logit models on my website. it's not 100% correct, but with Laplace
approximations I get reasonable results.

There is a correct way of doing it, but it's complicated. I think it maybe
in Quene & van den Bergh_2003, but I am not sure.

hope some of this helps.

florian

Any comments would be appreciated - be it to clear up some
> misconceptions on my part, or be it to solve my problem at hand.
>
>
>
> [1] http://tolstoy.newcastle.edu.au/R/e2/help/07/06/18477.html
>
>
> --
> David Reitter
> ICCS/HCRC, Informatics, University of Edinburgh
> http://www.david-reitter.com
>
>
>
>
>
> _______________________________________________
> R-lang mailing list
> R-lang at ling.ucsd.edu
> http://pidgin.ucsd.edu/mailman/listinfo/r-lang
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pidgin.ucsd.edu/pipermail/r-lang/attachments/20071010/7c55dd15/attachment.htm 


More information about the R-lang mailing list