[R-lang] Re: Positive and negative logLik and BIC in model comparison (lmer)
T. Florian Jaeger
tiflo@csli.stanford.edu
Mon Aug 1 11:38:53 PDT 2011
Hi Anja,
I wanted to add one thing to Steve's answer. In many cases, when you compare
two models, you can try to fit a superset model with all predictors from
both models. in those cases, you can use model comparison to compare the two
original models against the superset model and draw conclusions from that
comparison.
When this doesn't work (e.g. because the two models aren't build on the same
data or the superset model does not converge), there is no clear convention.
I have seen the BIC applied more often in the cognitive sciences, compared
to the AIC, but that might just be a perception bias on my side.
Florian
On Mon, Aug 1, 2011 at 7:45 AM, piantado <piantado@mit.edu> wrote:
> Hi,
>
> I have frequently encountered positive logLik values and now heard
> > that this might be due to bug in the lmer function. However, I also
> > recently found Douglas Bates stating that "a positive log-likelihood
> > is acceptable in a model for a continuous response" in an S-list.
> > Positive logLiks appear in Baayen's 2008 introductory book, always
> > together with negative AIC and BIC. He does not seem to treat them as
> > erroneous.
>
> It is possible to get positive log likelihoods, in general (I don't know
> anything about bugs in lmer). Imagine a uniform distribution on the real
> line from 0 to 1/10. Since this distribution must integrate to 1, it
> must have a ("probability") value of 10, for each point in 0 to 1/10.
> This means that if you saw a point like 0.023, it would have a log
> likelihood of log 10, which is positive. This same thing also occurs
> with, for instance, very low-variance gaussians, which also must
> integrate to 1 and so must go above 1.
>
> (There is a technical point about what these "probabilities" greater
> than 1 really mean, but it ends up not being an issue if you think only,
> say, intervals having a defined probability---in that case, any interval
> for these distributions will have a probability in [0,1]. After all, an
> individual point is zero probability for all of these continuous
> distributions anyway.)
>
> > Instead, if I understood correctly, he chooses the model
> > with more negative AIC/BIC (smaller value) and more positive logLik
> > (larger value) as the better model in these comparisons.
> > So did I get it right and is this the way to go or is there a bug
> that
> > inverts the polarity of the numbers?
>
> In general you want to choose AIC and BIC to be closest to negative
> infinity. The formula for these are helpful here. AIC is
>
> 2k - 2 log L
>
> where L is (non-logged) likelihood and k is the number of free
> parameters. BIC is
>
> k log(n) - 2 log L
>
> where n is the number of data points. Here, you can see that there is a
> negative in front of the log likelihood, meaning that since you prefer
> (log) likelihoods closer to positive infinity, you prefer AIC/BIC closer
> to negative infinity.
>
> As second question: Is there a general rule of thumb for cases when
> > AIC and BIC point into different directions? Does it depend on the
> > data set?
> >
> Yup, you can look at the above formula and see that in some cases they
> will go in different directions. It looks like you can only get
> different signs for AIC vs BIC when the log likelihood is positive
> (otherwise both AIC and BIC are positive). You can fiddle with k, n, and
> L in these formula to get any pair of signs for AIC and BIC.
>
> No matter direction the sign is, numbers closer to negative infinity are
> better for AIC and BIC (this is why it's confusing to state what you
> prefer as "larger" versus "smaller" values).
>
> Or is it a matter of taste how much one wants to avoid
> > overfitting? Should one trust the value that agrees with the logLik?
>
> Well, I think which you choose mainly depends on your theoretical
> outlook (although people on the list may disagree?) since these are
> motivated by information-theoretic (AIC) vs. bayesian (BIC) ideals. For
> non-tiny data sets, BIC more harshly penalizes free parameters (factor
> of log(n) rather than 2 on k).
>
> ++Steve
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucsd.edu/pipermail/ling-r-lang-l/attachments/20110801/ac8fe563/attachment.html
More information about the ling-r-lang-L
mailing list