[R-lang] Re: Positive and negative logLik and BIC in model comparison (lmer)

Mon Aug 1 06:45:29 PDT 2011

Hi,

 I have frequently encountered positive logLik values and now heard
>  that this might be due to bug in the lmer function. However, I also
>  recently found Douglas Bates stating that "a positive log-likelihood
>  is acceptable in a model for a continuous response" in an S-list.
>  Positive logLiks appear in Baayen's 2008 introductory book, always
>  together with negative AIC and BIC. He does not seem to treat them as
>  erroneous.

It is possible to get positive log likelihoods, in general (I don't know
anything about bugs in lmer). Imagine a uniform distribution on the real
line from 0 to 1/10. Since this distribution must integrate to 1, it
must have a ("probability") value of 10, for each point in 0 to 1/10.
This means that if you saw a point like 0.023, it would have a log
likelihood of log 10, which is positive. This same thing also occurs
with, for instance, very low-variance gaussians, which also must
integrate to 1 and so must go above 1. 

(There is a technical point about what these "probabilities" greater
than 1 really mean, but it ends up not being an issue if you think only,
say, intervals having a defined probability---in that case, any interval
for these distributions will have a probability in [0,1]. After all, an
individual point is zero probability for all of these continuous
distributions anyway.)

> Instead, if I understood correctly, he chooses the model
>  with more negative AIC/BIC (smaller value) and more positive logLik
>  (larger value) as the better model in these comparisons.
>  So did I get it right and is this the way to go or is there a bug
that
>  inverts the polarity of the numbers?

In general you want to choose AIC and BIC to be closest to negative
infinity. The formula for these are helpful here. AIC is 

2k - 2 log L

where L is (non-logged) likelihood and k is the number of free
parameters. BIC is 

k log(n) - 2 log L  

where n is the number of data points. Here, you can see that there is a
negative in front of the log likelihood, meaning that since you prefer
(log) likelihoods closer to positive infinity, you prefer AIC/BIC closer
to negative infinity.  

 As second question: Is there a general rule of thumb for cases when
>  AIC and BIC point into different directions? Does it depend on the
>  data set?
> 
Yup, you can look at the above formula and see that in some cases they
will go in different directions. It looks like you can only get
different signs for AIC vs BIC when the log likelihood is positive
(otherwise both AIC and BIC are positive). You can fiddle with k, n, and
L in  these formula to get any pair of signs for AIC and BIC. 

No matter direction the sign is, numbers closer to negative infinity are
better for AIC and BIC (this is why it's confusing to state what you
prefer as "larger" versus "smaller" values). 

 Or is it a matter of taste how much one wants to avoid
>  overfitting? Should one trust the value that agrees with the logLik?

Well, I think which you choose mainly depends on your theoretical
outlook (although people on the list may disagree?) since these are
motivated by information-theoretic (AIC) vs. bayesian (BIC) ideals. For
non-tiny data sets, BIC more harshly penalizes free parameters (factor
of log(n) rather than 2 on k).  

++Steve