[R-lang] Re: "Zero" problem with lmer

Sun Oct 10 14:32:37 PDT 2010

On Oct 9, 2010, at 11:04 PM, Marie Coppola wrote:

> Dear R-users and -experts,
> 
> I am performing a rather simple analysis on a small data set (pasted below this email) and am having detecting any effects because of invariant responses in some conditions. I understand from a previous query from Laura de Ruiter that this is because the model cannot estimate the standard error when there are no responses for a given level of a factor. Roger suggested adding an extra data point to make those cells non-zero. This fixed the wildly high standard errors I was getting, and resulted in a significant interaction. 

Let me just clarify an important point here.  In my response to Laura de Ruiter's query (this was 22 December 2008), I was *not* advocating adding extra data points to make cells non-zero in order to analyze the data!  I was just illustrating the point that z statistic-based p-values for a parameter of logit models is unreliable when the maximum-likelihood-fitted value of the parameter is very large.  The purpose of adding an artificial data point was simply to underscore the unreliablility of the z statistic, by showing that if one imagined a slightly different dataset where the trend of interest was clearly weaker but as a result didn't have zero-count cells, the significance of the effect actually increased according to the z statistic, which is clearly counter-intuitive behavior.

> But I would like to analyze the actual dataset ;) . I considered following Roger's advice to Laura, in which he showed how to use a likelihood-ratio test to compare the full model with a simpler one, in which one collapses two levels of one of the factors). I do not see how I can do that here because I only have two levels of the factor in question. I would appreciate any suggestions. (I am facing this problem in several analyses.)
> 
> I am testing whether the factor "Agent" (which has two levels: "agent" and "no_agent") is a significant predictor for the binary variable "Handshape". The random factor is "Subject". The distribution of the data is as follows, and the real data are the same except for zeroes in two of the cells that currently display ones (annotated):

Regarding how to analyze the dataset (which I'm just calling "dat" below; and I tossed the), I agree with Daniel and Florian in general, and it seems pretty clear that you don't need to do detailed analysis of the dataset to pull out the glaring observation that "handling" happens only in the agent+classifier case.  As Daniel pointed out, you should first question whether you need random effects; it's worth asking whether there's sufficient inter-subject variation that the apparent effect might generalize less reliably across individuals than one might think from the cell counts in the 2x2 representation.  Since you only have one "handling" observation outside of the agent+classifier case, you might as well just tabulate subject by handshape:

> with(dat, xtabs(~ Subject + Handshape))
       Handshape
Subject handling object
      1        5     30
      2        8     36
      3        3     13
      4        5     43
      5        5     21
      6       11     48

No evidence of major cross-subject variation.  A simple lmer fit confirms this:

> lmer(Handshape ~ 1 + (1 | Subject),data=dat,family="binomial")
[...]
   Data: dat 
   AIC   BIC logLik deviance
 206.2 213.1 -101.1    202.2
Random effects:
 Groups  Name        Variance Std.Dev.
 Subject (Intercept)  0        0      
[...]

No need for random effects.  Comparing model log-likelihoods indicates the same thing:

> m.null <- glm(Handshape ~ 1,data=dat,family="binomial")
> logLik(m.null)
'log Lik.' -101.1026 (df=1)

The mixed-effects analysis didn't do any better (and you can try this comparison with other model specifications as well).  

So you might as well just ignore subject identity and analyze this as a 2x2x2 data table.  Similar to what Florian suggested, you could simply lump everything but the agent+classifier data together and then do Fisher's exact test to show that agent+classifier behaves significantly differently in terms of handshape than all the other conditions.  Which is pretty clear from just looking at your data.  You don't really have enough additional structure going on in your dataset to meaningfully ask whether there is an overall main effect of Agent in your dataset.

Hope this helps,

Roger

--

Roger Levy                      Email: rlevy@ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy