[R-lang] Weird outcome logit mixed-effect model

Mon Dec 22 13:33:10 PST 2008

Roger Levy wrote:
> Laura de Ruiter wrote:
>> Dear R-users and -experts,
>>
>> I am performing a rather simple analysis on a small data set (pasted 
>> below this email) and keep getting a to me inexplicable result. 
>> Perhaps I am missing something here - it would be great if someone 
>> could point out to me what I am doing wrong.
>>
>> I want to test whether the factor "Info" (which has three levels: 
>> "new", "given", "accessible") is a significant predictor for the 
>> binary variable "DeaccYN". The random factor is "Subject". The 
>> distribution of the data looks as follows:
>>
>> ----------------------------------------------------------------------------- 
>>
>>  
>> Info
>> DeaccYN given new accessible
>> no 25 42 21
>> yes 11 0 1
>> ------------------------------------------------------------------------------ 
>>
>>
>> This is the model:
>>
>> ---------------------------------------------------------------------------------------------------------- 
>>
>> deacc.lmer = lmer (DeaccYN ~ Info + (1|Subject), data = dat, family = 
>> "binomial")
>> ----------------------------------------------------------------------------------------------------------------- 
>>
>>
>> However, given the distribution above, this outcome seems rather weird 
>> to me:
>>
>> --------------------------------------------------------------------------------------------------------- 
>>
>> summary (deacc.lmer)
>> Generalized linear mixed model fit using Laplace
>> Formula: DeaccYN ~ Info + (1 | Subject)
>> Data: dat
>> Family: binomial(logit link)
>> AIC BIC logLik deviance
>> 60.4 70.82 -26.2 52.4
>> Random effects:
>> Groups Name Variance Std.Dev.
>> Subject (Intercept) 0.18797 0.43356
>> number of obs: 100, groups: Subject, 21
>>
>> Estimated scale (compare to 1 ) 0.7316067
>>
>> Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -0.8635 0.3795 -2.2754 0.0229 *
>> Infonew -18.7451 2764.2445 -0.0068 0.9946
>> Infoaccessible -2.2496 1.1186 -2.0110 0.0443 *
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>  > [...]
>>
>> ---------------------------------------------------------------------------------------------------- 
>>
>>
>> Why should the difference between 25/11 and 21/1 be significant, but 
>> the difference between 25/11 and 42/0 not? Very odd to me seems the 
>> standard error of 2764!
>>
>  > [...]
> 
>> I was wondering: Is it perhaps a problem for the model that there are 
>> no cases in the DeaccYN == "yes" category for Info == "given"? And if 
>> this 
>                                                       ^^^^^
>                                          I believe you mean "new" here.
>> is the case, why?
>> Am I overlooking something here?
> 
> Dear Laura,
> 
> Independently of the issue that Florian is raising...you are right that 
> the lack of is a problem for the model (to be precise, it's a problem 
> for estimating the significance of the parameter estimate using the z 
> value). 

Whoops, this should have read

   "you are right that the lack of observations in the "yes/new" 
category is a problem for the model..."


-- 

Roger Levy                      Email: rlevy at ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy