[R-lang] Re: ling-r-lang-L Digest, Vol 1, Issue 54

Sun Oct 17 11:21:09 PDT 2010

Dear Roger and Florian (and others),

Thanks very much for your attention to my query. I have two follow-up
questions:

1) I'm pursuing your suggestion of applying Fisher's Exact Test/Chi-Square
analysis to this comparison, and want to ensure that I'm not violating the
assumption of independence by ignoring subject identity (each subject
contributes multiple data points). I understand Roger's demonstration that
there was no meaningful cross-subject variation (pasted below). Is
demonstrating a lack of variation sufficient in general to justify violating
the assumption of independence of the Fisher and Chi-square tests? Are there
conventions for determining an "acceptable" level of cross-subject variation
that allows one to assume independence for such tests? (Note: the n is too
large for the Fisher test so I would be using Chi-Square in this case. I'm
pasting the relevant portions of the previous responses below:)

[I have looked into this problem a bit and have  found a short paper by
Cervone that addresses the issue of including multiple responses from the
same subject in a 2x2 frequency table analyzed with Chi-Square; he advocates
the use of a randomization procedure to avoid violating the assumption of
independence: In his example, there is a high degree of cross-subject
variation, with no subject showing the pattern that is indicated by the
pooled data. I realize that that is not the case in the data set I am
analyzing.
Cervone, Daniel. (1987). Chi-square analyses of self-efficacy data: A
cautionary note. Cognitive Therapy and Research, 11(6), 709-714.
http://dx.doi.org/10.1007/BF01176007]

 > with(dat, xtabs(~ Subject + Handshape))
      Handshape
Subject handling object
     1        5     30
     2        8     36
     3        3     13
     4        5     43
     5        5     21
     6       11     48

No evidence of major cross-subject variation.  A simple lmer fit confirms
this:

> lmer(Handshape ~ 1 + (1 | Subject),data=dat,family="binomial")

 [...]
>   Data: dat
>   AIC   BIC logLik deviance
>  206.2 213.1 -101.1    202.2
> Random effects:
>  Groups  Name        Variance Std.Dev.
>  Subject (Intercept)  0        0
> [...]
>
> No need for random effects.  Comparing model log-likelihoods indicates the
> same thing:
>
> > m.null <- glm(Handshape ~ 1,data=dat,family="binomial")
> > logLik(m.null)
> 'log Lik.' -101.1026 (df=1)
>
> The mixed-effects analysis didn't do any better (and you can try this
> comparison with other model specifications as well).
>

> *So you might as well just ignore subject identity and analyze this as a
> 2x2x2 data table*.  Similar to what Florian suggested, you could simply
> lump everything but the agent+classifier data together and then do Fisher's
> exact test to show that agent+classifier behaves significantly differently
> in terms of handshape than all the other conditions.
>

2) Also, just to make sure I'm following, in this scenario, wouldn't the
resulting data table be 2x2 (Handshape Type (Object vs. Handling) x
Construction Type(Agent+Classifier vs. the other 3 types)?

Thanks very much in advance for any comments,
Marie

On Mon, Oct 11, 2010 at 3:00 PM, <ling-r-lang-l-request@mailman.ucsd.edu>wrote:

> Send ling-r-lang-L mailing list submissions to
>        ling-r-lang-l@mailman.ucsd.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://mailman.ucsd.edu/mailman/listinfo/ling-r-lang-l
> or, via email, send a message with subject or body 'help' to
>        ling-r-lang-l-request@mailman.ucsd.edu
>
> You can reach the person managing the list at
>        ling-r-lang-l-owner@mailman.ucsd.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of ling-r-lang-L digest..."
>
>
> Today's Topics:
>
>   1. Re: "Zero" problem with lmer (T. Florian Jaeger)
>   2. Re: "Zero" problem with lmer (Marie Coppola)
>   3. Re: "Zero" problem with lmer (Levy, Roger)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 10 Oct 2010 12:25:22 -0700
> From: "T. Florian Jaeger" <tiflo@csli.stanford.edu>
> Subject: [R-lang] Re: "Zero" problem with lmer
> To: Daniel Ezra Johnson <danielezrajohnson@gmail.com>
> Cc: r-lang <r-lang@ling.ucsd.edu>, Marie Coppola
>        <marie.coppola@gmail.com>
> Message-ID:
>        <AANLkTi=Ck=6cYoDST_oh0dp7--DsQ8jeZEHOXrQ-dsdH@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I agree with Daniel. You could just do a Fisher exact, for example. But you
> don't really need a test for the classifier data set. For the other data
> set, you cannot say anything about an interaction, because you don't have
> enough "handling" cases to detect an interaction (with or without
> adjustment
> for subject clusters).
>
> Florian
>
> On Sun, Oct 10, 2010 at 4:01 AM, Daniel Ezra Johnson <
> danielezrajohnson@gmail.com> wrote:
>
> > I think sometimes we get carried away with our wonderful tools like
> lmer().
> >
> > > , , Construction = Classifier
> > >
> > >           Agent
> > > Handshape  agent no_agent
> > >   handling    36        1 (added)
> > >   object      27       65
> > >
> > > , , Construction = Lexical_item
> > >
> > >           Agent
> > > Handshape  agent no_agent
> > >   handling     1        1 (added)
> > >   object      45       54
> >
> > For this data:
> >
> > a) in the Classifier construction, it is very clear that Agent favors
> > Handling, compared to No Agent
> >
> > b) in the Lexical Item construction, there is no evidence of an effect
> > of Agent/No Agent on Handshape (because Handshape is largely or
> > totally invariant*)
> >
> > You can call this an interaction between Agent and Classifier, if you
> > wanted...
> >
> > *Are all three 1's "added" or is one of them real?
> >
> > Either way, you're very close to a situation where Handling only
> > occurs in one of the four cells, only in Classifier/Agent.
> >
> > I don't think an extreme distribution like that lends itself to (aka
> > needs) ordinary quantitative regression analysis (and it's also not
> > clear how there _could be_ a Subject effect).
> >
> > Just my two cents,
> > Daniel
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20101010/4b28a80a/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Sun, 10 Oct 2010 16:59:45 -0400
> From: Marie Coppola <marie.coppola@gmail.com>
> Subject: [R-lang] Re: "Zero" problem with lmer
> To: Daniel Ezra Johnson <danielezrajohnson@gmail.com>
> Cc: r-lang@ling.ucsd.edu
> Message-ID:
>        <AANLkTint53zTGM6GUoWZ4bb61_Ajgg=Xhwpy=5BqG_uP@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Dear Daniel,
>
> Thanks for your response. To clarify, one of the three handling responses
> was real (the one in the Agent condition for the Lexical Items).
>
> Yes, I can totally see this as getting a bit carried away with tools.... To
> give you a bit of context, this is a subset of a longitudinal data set, and
> it was just the clearest example of the zero problem. There are other
> groups
> I am trying to compare this group to, where there is some variation, and I
> haven't been able to include these invariant conditions in the model, so
> it's been hard to know whether I have a basis for claiming that the other
> groups are similar or different. So that's my motivation for trying to
> resolve this issue. Because we have such a small sample size (n=3 subjects
> for the other two groups), I haven't pursued the non-parametric route,
> because it appeared to me tha the appropriate tests did not have
> significance values defined for n's that small. Hence my plunge into the
> world of lmer()....
>
> I've written the results in a hybrid fashion, appealing to qualitative
> arguments (as you did in your response) and applying the quantitative tools
> where there is variation. It seems to have worked out all right in the end.
>
> Thanks again,
> Marie
>
> On Sun, Oct 10, 2010 at 7:01 AM, Daniel Ezra Johnson <
> danielezrajohnson@gmail.com> wrote:
>
> > I think sometimes we get carried away with our wonderful tools like
> lmer().
> >
> > > , , Construction = Classifier
> > >
> > >           Agent
> > > Handshape  agent no_agent
> > >   handling    36        1 (added)
> > >   object      27       65
> > >
> > > , , Construction = Lexical_item
> > >
> > >           Agent
> > > Handshape  agent no_agent
> > >   handling     1        1 (added)
> > >   object      45       54
> >
> > For this data:
> >
> > a) in the Classifier construction, it is very clear that Agent favors
> > Handling, compared to No Agent
> >
> > b) in the Lexical Item construction, there is no evidence of an effect
> > of Agent/No Agent on Handshape (because Handshape is largely or
> > totally invariant*)
> >
> > You can call this an interaction between Agent and Classifier, if you
> > wanted...
> >
> > *Are all three 1's "added" or is one of them real?
> >
> > Either way, you're very close to a situation where Handling only
> > occurs in one of the four cells, only in Classifier/Agent.
> >
> > I don't think an extreme distribution like that lends itself to (aka
> > needs) ordinary quantitative regression analysis (and it's also not
> > clear how there _could be_ a Subject effect).
> >
> > Just my two cents,
> > Daniel
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20101010/9f08a0a8/attachment-0001.html
>
> ------------------------------
>
> Message: 3
> Date: Sun, 10 Oct 2010 14:32:37 -0700
> From: "Levy, Roger" <rlevy@ucsd.edu>
> Subject: [R-lang] Re: "Zero" problem with lmer
> To: Marie Coppola <marie.coppola@gmail.com>
> Cc: "r-lang@ling.ucsd.edu" <r-lang@ling.ucsd.edu>
> Message-ID: <AB90EE86-CFDB-4FD3-8F03-8AAD8414147F@ucsd.edu>
> Content-Type: text/plain; charset="us-ascii"
>
>
> On Oct 9, 2010, at 11:04 PM, Marie Coppola wrote:
>
> > Dear R-users and -experts,
> >
> > I am performing a rather simple analysis on a small data set (pasted
> below this email) and am having detecting any effects because of invariant
> responses in some conditions. I understand from a previous query from Laura
> de Ruiter that this is because the model cannot estimate the standard error
> when there are no responses for a given level of a factor. Roger suggested
> adding an extra data point to make those cells non-zero. This fixed the
> wildly high standard errors I was getting, and resulted in a significant
> interaction.
>
> Let me just clarify an important point here.  In my response to Laura de
> Ruiter's query (this was 22 December 2008), I was *not* advocating adding
> extra data points to make cells non-zero in order to analyze the data!  I
> was just illustrating the point that z statistic-based p-values for a
> parameter of logit models is unreliable when the maximum-likelihood-fitted
> value of the parameter is very large.  The purpose of adding an artificial
> data point was simply to underscore the unreliablility of the z statistic,
> by showing that if one imagined a slightly different dataset where the trend
> of interest was clearly weaker but as a result didn't have zero-count cells,
> the significance of the effect actually increased according to the z
> statistic, which is clearly counter-intuitive behavior.
>
> > But I would like to analyze the actual dataset ;) . I considered
> following Roger's advice to Laura, in which he showed how to use a
> likelihood-ratio test to compare the full model with a simpler one, in which
> one collapses two levels of one of the factors). I do not see how I can do
> that here because I only have two levels of the factor in question. I would
> appreciate any suggestions. (I am facing this problem in several analyses.)
> >
> > I am testing whether the factor "Agent" (which has two levels: "agent"
> and "no_agent") is a significant predictor for the binary variable
> "Handshape". The random factor is "Subject". The distribution of the data is
> as follows, and the real data are the same except for zeroes in two of the
> cells that currently display ones (annotated):
>
> Regarding how to analyze the dataset (which I'm just calling "dat" below;
> and I tossed the), I agree with Daniel and Florian in general, and it seems
> pretty clear that you don't need to do detailed analysis of the dataset to
> pull out the glaring observation that "handling" happens only in the
> agent+classifier case.  As Daniel pointed out, you should first question
> whether you need random effects; it's worth asking whether there's
> sufficient inter-subject variation that the apparent effect might generalize
> less reliably across individuals than one might think from the cell counts
> in the 2x2 representation.  Since you only have one "handling" observation
> outside of the agent+classifier case, you might as well just tabulate
> subject by handshape:
>
> > with(dat, xtabs(~ Subject + Handshape))
>       Handshape
> Subject handling object
>      1        5     30
>      2        8     36
>      3        3     13
>      4        5     43
>      5        5     21
>      6       11     48
>
> No evidence of major cross-subject variation.  A simple lmer fit confirms
> this:
>
> > lmer(Handshape ~ 1 + (1 | Subject),data=dat,family="binomial")
> [...]
>   Data: dat
>   AIC   BIC logLik deviance
>  206.2 213.1 -101.1    202.2
> Random effects:
>  Groups  Name        Variance Std.Dev.
>  Subject (Intercept)  0        0
> [...]
>
> No need for random effects.  Comparing model log-likelihoods indicates the
> same thing:
>
> > m.null <- glm(Handshape ~ 1,data=dat,family="binomial")
> > logLik(m.null)
> 'log Lik.' -101.1026 (df=1)
>
> The mixed-effects analysis didn't do any better (and you can try this
> comparison with other model specifications as well).
>
> So you might as well just ignore subject identity and analyze this as a
> 2x2x2 data table.  Similar to what Florian suggested, you could simply lump
> everything but the agent+classifier data together and then do Fisher's exact
> test to show that agent+classifier behaves significantly differently in
> terms of handshape than all the other conditions.  Which is pretty clear
> from just looking at your data.  You don't really have enough additional
> structure going on in your dataset to meaningfully ask whether there is an
> overall main effect of Agent in your dataset.
>
> Hope this helps,
>
> Roger
>
>
> --
>
> Roger Levy                      Email: rlevy@ling.ucsd.edu
> Assistant Professor             Phone: 858-534-7219
> Department of Linguistics       Fax:   858-534-4789
> UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy<http://ling.ucsd.edu/%7Erlevy>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> ling-r-lang-L mailing list
> ling-r-lang-L@mailman.ucsd.edu
> https://mailman.ucsd.edu/mailman/listinfo/ling-r-lang-l
>
>
> End of ling-r-lang-L Digest, Vol 1, Issue 54
> ********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20101017/bf252cf2/attachment-0001.html