[R-lang] Re: Determining whether a model fits a second data set

Amanda J. Owen ajowen@gmail.com
Thu Oct 14 05:18:11 PDT 2010


I haven't tried any of these things yet - thanks so much.  Both the
idea of simulations as a way to come up with confidence intervals and
the idea of predicting outcomes is helpful.  I'll have to play with
the commands and see how things go and then I may come back with more
questions. I remembered earlier conversations/readings that said that
pseudo-R2 were not ideal but didn't know enough to know what the other
options were.  It's nice to be able to try a few things out.

I see the challenge in changing to a completely new data set with new
speakers.  One of the datasets is the same speakers but on a different
task (spontaneous language samples rather than elicited production).
I'll be curious to see if that one works better.  Maybe I'll leave
aside the third dataset with new speakers because of the complications
that induces.

Amanda

Amanda J. Owen
ajowen@gmail.com


amanda-owen@uiowa.edu
Dept of Speech Pathology & Audiology
University of Iowa
Iowa City, IA



On Wed, Oct 13, 2010 at 11:48 AM, T. Florian Jaeger
<tiflo@csli.stanford.edu> wrote:
> Hi Amanda,
> a couple of things come to mind that might be useful. Sorry, if you've
> considered some of them and they didn't work.
> First of all, most model functions in R allow you to use your model to make
> predictions (usually that function is simply called predict()).
> There are also nice functions in the arm() package by Gelman, which allows
> you to run simulation using your model (this might be useful in response to
> some of the responses to your question). So you can derive 'confidence
> intervals' over your parameters based on novel data. (see sim() in the arm()
> package).
> The biggest problem with applying mixed models to unseen data is that you
> are likely to encounter unseen levels of your random variables in the new
> data (e.g. new speakers). theoretically it should be possible to estimate
> the BLUBs for the new data based on the fixed effect parameters and random
> variances fit on the old data (BLUPs are not parameters of the model, but
> estimated based on the data, the fixed effect parameters and the random
> variances). But i am not sure whether there already is a function
> implemented for that.
> As for measures of quality comparable to an R2, you could consider doing a
> Nagelkerke R2, which is theoretically defined for mixed models, too. See
> Jaeger, 2010:37, where I mention this measure for mixed models and provide a
> link to some R code that implements it. Here's an attempt to paste that
> section.
> ===================== paste ============================
> While several pseudo R2 measures have also been proposed for logit models as
> measures of overall model quality, they lack the intuitive and reliable
> interpretation of the R2 available for linear models. One of the most
> commonly used pseudo R2 measures is the Nagelkerke R2. The Nagelkerke R2
> assesses the quality of a model with regard to a baseline model (the model
> with only the intercept). While this measure is usually employed for
> ordinary logit models, its definition extends to multilevel logit
> models. For the current model, Nagelkerke R2 =0:34 compared against an
> ordinary logit model with only
> the intercept as baseline.\footnote{Despite my own skepticism about
> Nagelkerke R2s, I have chosen to present them since they may be familiar to
> some readers.
> All Nagelkerke R2s are calculated against an ordinary logit model with only
> the intercept as baseline. The R code used to calculate
> the Nagelkerke R2s presented here is available at
> http://hlplab.wordpress.com/2009/08/29/nagelkerke-and-coxsnellpseudo-
> r2-for-mixed-logit-models/.}
> ===================== end paste ============================
> Many other comparable measures can be extended to mixed models - none as
> conceptually appealing as an R-square. The blog page linked in my paper
> contains links to more information about these other measures.
> HTH,
> Florian
> On Wed, Oct 13, 2010 at 5:59 AM, Amanda J. Owen <ajowen@gmail.com> wrote:
>>
>> Hi,
>>   I think this is a more general statistics question rather than a
>> particular question about the use of lmer, but I would appreciate the
>> advice from this group since the question is specific to the types of
>> models fit under lmer/LanguageR.
>>
>> I have fit a (rather complex) model to a dataset based on elicited
>> data. The model is a mixed model logistic regression with 2 random
>> factors and 6 predictors.  I have both a 'real' model using true
>> values for the predictors and a standardized model which has
>> standardized all predictors so that their relative contributions to
>> predicting the outcome (past tense accuracy) can be compared more
>> directly.  Some factors are significant but make a relatively small
>> contribution.  The data set is fairly rich with around 8000 datapoints
>> (216 past tense opportunities per child).
>>
>>    I have a similar but smaller data set from spontaneous data from
>> the same children and a third data set from spontaneous data from a
>> new group of children that is even smaller.  I can quantify the
>> predictor variables (lexical frequency, phonotactics, etc) in such a
>> way that I should be able to fit the same model to this second
>> dataset.  I'm especially interested in whether we could actually use
>> the same beta values to have a reasonably good fit because I'd like to
>> be able to comment on whether the relative contributions of the
>> predictors was an artifact of the data set, the particular children
>> involved,  or are true more generally.
>>
>>   I'm not sure how to assess model fit without simply comparing a
>> reduced/full model on the same dataset.  I think if I was using linear
>> regression I would compare R-squared values across the two datasets
>> and have some comment about the fact that the models explain similar
>> amounts of variance.  But I'm stumped when it comes to logistic
>> regression/the introduction of random effects.
>>
>> Thanks so much for any advice or help you can provide.
>> Amanda
>>
>>
>> Amanda J. Owen
>> ajowen@gmail.com
>>
>>
>> amanda-owen@uiowa.edu
>> Dept of Speech Pathology & Audiology
>> University of Iowa
>> Iowa City, IA
>
>



More information about the ling-r-lang-L mailing list