[R-lang] Re: ling-r-lang-L Digest, Vol 1, Issue 54

T. Florian Jaeger tiflo@csli.stanford.edu
Sun Oct 17 12:13:37 PDT 2010


Hi Marie,

thanks for forwarding the paper/reference. I think that in your case,
readers/reviewers should accept that you have a stable pattern if you just
gave the by-subject table. But you might as well tell them/write that a
model with random intercept does not yield significantly lower deviance than
a model without. At that point, I'd be definitely fine.

Of course, you could do a bootstrap over your chisq, but it seems overkill
to me.

*So you might as well just ignore subject identity and analyze this as a
>> 2x2x2 data table*.  Similar to what Florian suggested, you could simply
>> lump everything but the agent+classifier data together and then do Fisher's
>> exact test to show that agent+classifier behaves significantly differently
>> in terms of handshape than all the other conditions.
>>
>
>
> 2) Also, just to make sure I'm following, in this scenario, wouldn't the
> resulting data table be 2x2 (Handshape Type (Object vs. Handling) x
> Construction Type(Agent+Classifier vs. the other 3 types)?
>

given what you wrote in your first email, I thought you would construct two
2x2 tables, one for Construction = Classifier and one for Construction =
Lexical. Each of them would have Agent vs. no agent crossed with object vs.
handling, no?

Florian

>
> Thanks very much in advance for any comments,
> Marie
>
>
> On Mon, Oct 11, 2010 at 3:00 PM, <ling-r-lang-l-request@mailman.ucsd.edu>wrote:
>
>> Send ling-r-lang-L mailing list submissions to
>>        ling-r-lang-l@mailman.ucsd.edu
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>        https://mailman.ucsd.edu/mailman/listinfo/ling-r-lang-l
>> or, via email, send a message with subject or body 'help' to
>>        ling-r-lang-l-request@mailman.ucsd.edu
>>
>> You can reach the person managing the list at
>>        ling-r-lang-l-owner@mailman.ucsd.edu
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of ling-r-lang-L digest..."
>>
>>
>> Today's Topics:
>>
>>   1. Re: "Zero" problem with lmer (T. Florian Jaeger)
>>   2. Re: "Zero" problem with lmer (Marie Coppola)
>>   3. Re: "Zero" problem with lmer (Levy, Roger)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sun, 10 Oct 2010 12:25:22 -0700
>> From: "T. Florian Jaeger" <tiflo@csli.stanford.edu>
>> Subject: [R-lang] Re: "Zero" problem with lmer
>> To: Daniel Ezra Johnson <danielezrajohnson@gmail.com>
>> Cc: r-lang <r-lang@ling.ucsd.edu>, Marie Coppola
>>        <marie.coppola@gmail.com>
>> Message-ID:
>>        <AANLkTi=Ck=6cYoDST_oh0dp7--DsQ8jeZEHOXrQ-dsdH@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> I agree with Daniel. You could just do a Fisher exact, for example. But
>> you
>> don't really need a test for the classifier data set. For the other data
>> set, you cannot say anything about an interaction, because you don't have
>> enough "handling" cases to detect an interaction (with or without
>> adjustment
>> for subject clusters).
>>
>> Florian
>>
>> On Sun, Oct 10, 2010 at 4:01 AM, Daniel Ezra Johnson <
>> danielezrajohnson@gmail.com> wrote:
>>
>> > I think sometimes we get carried away with our wonderful tools like
>> lmer().
>> >
>> > > , , Construction = Classifier
>> > >
>> > >           Agent
>> > > Handshape  agent no_agent
>> > >   handling    36        1 (added)
>> > >   object      27       65
>> > >
>> > > , , Construction = Lexical_item
>> > >
>> > >           Agent
>> > > Handshape  agent no_agent
>> > >   handling     1        1 (added)
>> > >   object      45       54
>> >
>> > For this data:
>> >
>> > a) in the Classifier construction, it is very clear that Agent favors
>> > Handling, compared to No Agent
>> >
>> > b) in the Lexical Item construction, there is no evidence of an effect
>> > of Agent/No Agent on Handshape (because Handshape is largely or
>> > totally invariant*)
>> >
>> > You can call this an interaction between Agent and Classifier, if you
>> > wanted...
>> >
>> > *Are all three 1's "added" or is one of them real?
>> >
>> > Either way, you're very close to a situation where Handling only
>> > occurs in one of the four cells, only in Classifier/Agent.
>> >
>> > I don't think an extreme distribution like that lends itself to (aka
>> > needs) ordinary quantitative regression analysis (and it's also not
>> > clear how there _could be_ a Subject effect).
>> >
>> > Just my two cents,
>> > Daniel
>> >
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20101010/4b28a80a/attachment-0001.html
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Sun, 10 Oct 2010 16:59:45 -0400
>> From: Marie Coppola <marie.coppola@gmail.com>
>> Subject: [R-lang] Re: "Zero" problem with lmer
>> To: Daniel Ezra Johnson <danielezrajohnson@gmail.com>
>> Cc: r-lang@ling.ucsd.edu
>> Message-ID:
>>        <AANLkTint53zTGM6GUoWZ4bb61_Ajgg=Xhwpy=5BqG_uP@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Dear Daniel,
>>
>> Thanks for your response. To clarify, one of the three handling responses
>> was real (the one in the Agent condition for the Lexical Items).
>>
>> Yes, I can totally see this as getting a bit carried away with tools....
>> To
>> give you a bit of context, this is a subset of a longitudinal data set,
>> and
>> it was just the clearest example of the zero problem. There are other
>> groups
>> I am trying to compare this group to, where there is some variation, and I
>> haven't been able to include these invariant conditions in the model, so
>> it's been hard to know whether I have a basis for claiming that the other
>> groups are similar or different. So that's my motivation for trying to
>> resolve this issue. Because we have such a small sample size (n=3 subjects
>> for the other two groups), I haven't pursued the non-parametric route,
>> because it appeared to me tha the appropriate tests did not have
>> significance values defined for n's that small. Hence my plunge into the
>> world of lmer()....
>>
>> I've written the results in a hybrid fashion, appealing to qualitative
>> arguments (as you did in your response) and applying the quantitative
>> tools
>> where there is variation. It seems to have worked out all right in the
>> end.
>>
>> Thanks again,
>> Marie
>>
>> On Sun, Oct 10, 2010 at 7:01 AM, Daniel Ezra Johnson <
>> danielezrajohnson@gmail.com> wrote:
>>
>> > I think sometimes we get carried away with our wonderful tools like
>> lmer().
>> >
>> > > , , Construction = Classifier
>> > >
>> > >           Agent
>> > > Handshape  agent no_agent
>> > >   handling    36        1 (added)
>> > >   object      27       65
>> > >
>> > > , , Construction = Lexical_item
>> > >
>> > >           Agent
>> > > Handshape  agent no_agent
>> > >   handling     1        1 (added)
>> > >   object      45       54
>> >
>> > For this data:
>> >
>> > a) in the Classifier construction, it is very clear that Agent favors
>> > Handling, compared to No Agent
>> >
>> > b) in the Lexical Item construction, there is no evidence of an effect
>> > of Agent/No Agent on Handshape (because Handshape is largely or
>> > totally invariant*)
>> >
>> > You can call this an interaction between Agent and Classifier, if you
>> > wanted...
>> >
>> > *Are all three 1's "added" or is one of them real?
>> >
>> > Either way, you're very close to a situation where Handling only
>> > occurs in one of the four cells, only in Classifier/Agent.
>> >
>> > I don't think an extreme distribution like that lends itself to (aka
>> > needs) ordinary quantitative regression analysis (and it's also not
>> > clear how there _could be_ a Subject effect).
>> >
>> > Just my two cents,
>> > Daniel
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20101010/9f08a0a8/attachment-0001.html
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Sun, 10 Oct 2010 14:32:37 -0700
>> From: "Levy, Roger" <rlevy@ucsd.edu>
>> Subject: [R-lang] Re: "Zero" problem with lmer
>> To: Marie Coppola <marie.coppola@gmail.com>
>> Cc: "r-lang@ling.ucsd.edu" <r-lang@ling.ucsd.edu>
>> Message-ID: <AB90EE86-CFDB-4FD3-8F03-8AAD8414147F@ucsd.edu>
>> Content-Type: text/plain; charset="us-ascii"
>>
>>
>> On Oct 9, 2010, at 11:04 PM, Marie Coppola wrote:
>>
>> > Dear R-users and -experts,
>> >
>> > I am performing a rather simple analysis on a small data set (pasted
>> below this email) and am having detecting any effects because of invariant
>> responses in some conditions. I understand from a previous query from Laura
>> de Ruiter that this is because the model cannot estimate the standard error
>> when there are no responses for a given level of a factor. Roger suggested
>> adding an extra data point to make those cells non-zero. This fixed the
>> wildly high standard errors I was getting, and resulted in a significant
>> interaction.
>>
>> Let me just clarify an important point here.  In my response to Laura de
>> Ruiter's query (this was 22 December 2008), I was *not* advocating adding
>> extra data points to make cells non-zero in order to analyze the data!  I
>> was just illustrating the point that z statistic-based p-values for a
>> parameter of logit models is unreliable when the maximum-likelihood-fitted
>> value of the parameter is very large.  The purpose of adding an artificial
>> data point was simply to underscore the unreliablility of the z statistic,
>> by showing that if one imagined a slightly different dataset where the trend
>> of interest was clearly weaker but as a result didn't have zero-count cells,
>> the significance of the effect actually increased according to the z
>> statistic, which is clearly counter-intuitive behavior.
>>
>> > But I would like to analyze the actual dataset ;) . I considered
>> following Roger's advice to Laura, in which he showed how to use a
>> likelihood-ratio test to compare the full model with a simpler one, in which
>> one collapses two levels of one of the factors). I do not see how I can do
>> that here because I only have two levels of the factor in question. I would
>> appreciate any suggestions. (I am facing this problem in several analyses.)
>> >
>> > I am testing whether the factor "Agent" (which has two levels: "agent"
>> and "no_agent") is a significant predictor for the binary variable
>> "Handshape". The random factor is "Subject". The distribution of the data is
>> as follows, and the real data are the same except for zeroes in two of the
>> cells that currently display ones (annotated):
>>
>> Regarding how to analyze the dataset (which I'm just calling "dat" below;
>> and I tossed the), I agree with Daniel and Florian in general, and it seems
>> pretty clear that you don't need to do detailed analysis of the dataset to
>> pull out the glaring observation that "handling" happens only in the
>> agent+classifier case.  As Daniel pointed out, you should first question
>> whether you need random effects; it's worth asking whether there's
>> sufficient inter-subject variation that the apparent effect might generalize
>> less reliably across individuals than one might think from the cell counts
>> in the 2x2 representation.  Since you only have one "handling" observation
>> outside of the agent+classifier case, you might as well just tabulate
>> subject by handshape:
>>
>> > with(dat, xtabs(~ Subject + Handshape))
>>       Handshape
>> Subject handling object
>>      1        5     30
>>      2        8     36
>>      3        3     13
>>      4        5     43
>>      5        5     21
>>      6       11     48
>>
>> No evidence of major cross-subject variation.  A simple lmer fit confirms
>> this:
>>
>> > lmer(Handshape ~ 1 + (1 | Subject),data=dat,family="binomial")
>> [...]
>>   Data: dat
>>   AIC   BIC logLik deviance
>>  206.2 213.1 -101.1    202.2
>> Random effects:
>>  Groups  Name        Variance Std.Dev.
>>  Subject (Intercept)  0        0
>> [...]
>>
>> No need for random effects.  Comparing model log-likelihoods indicates the
>> same thing:
>>
>> > m.null <- glm(Handshape ~ 1,data=dat,family="binomial")
>> > logLik(m.null)
>> 'log Lik.' -101.1026 (df=1)
>>
>> The mixed-effects analysis didn't do any better (and you can try this
>> comparison with other model specifications as well).
>>
>> So you might as well just ignore subject identity and analyze this as a
>> 2x2x2 data table.  Similar to what Florian suggested, you could simply lump
>> everything but the agent+classifier data together and then do Fisher's exact
>> test to show that agent+classifier behaves significantly differently in
>> terms of handshape than all the other conditions.  Which is pretty clear
>> from just looking at your data.  You don't really have enough additional
>> structure going on in your dataset to meaningfully ask whether there is an
>> overall main effect of Agent in your dataset.
>>
>> Hope this helps,
>>
>> Roger
>>
>>
>> --
>>
>> Roger Levy                      Email: rlevy@ling.ucsd.edu
>> Assistant Professor             Phone: 858-534-7219
>> Department of Linguistics       Fax:   858-534-4789
>> UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy<http://ling.ucsd.edu/%7Erlevy>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> ling-r-lang-L mailing list
>> ling-r-lang-L@mailman.ucsd.edu
>> https://mailman.ucsd.edu/mailman/listinfo/ling-r-lang-l
>>
>>
>> End of ling-r-lang-L Digest, Vol 1, Issue 54
>> ********************************************
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20101017/ee1994e7/attachment-0001.html 


More information about the ling-r-lang-L mailing list