[R-lang] Re: Concerning glm with contrasts

Wed Jul 21 09:58:45 PDT 2010

Dear Dr. Levy,

Thank you very much for your detailed response. Your information is very
helpful.

Here are my answers to your questions, and I have some further questions
below.
1) The totals are all 508 for all languages because I want to make sure that
I am comparing the frequencies of constructions that are conveying the same
meanings. The counts are based on a novel which is originally written in
Japanese, and translated into Chinese and English. When I analyzed the
texts, I only included those that have equivalents in all three languages.
Therefore, they always have the same totals. I guess then the totals are not
something that I'm interested in. I'm interested in how the languages differ
in terms of the frequencies of different constructions.
2) The reason I used Poisson is because I thought it is appropriate for
count data. Also I have a friend who is a statistician, and this is her
suggestion.
3) Regarding Q3 in my original post, my hypothesis is that English prefers
passive more than Japanese does, and Japanese prefers Intransitive more than
English, but Chinese can be either in both cases.

Here are my further questions:
1) You mentioned "multinominal logistic regression". I do know a little
about logistic regression, but not at all about this specific kind of
logistic regression. Which command should I use? glm(), mlogit(), lmer(), or
something else? And what kind of distribution (e.g., binomial) should I use?
2) Regarding the significant difference shown in "language1" in the model,
you said that "the language1 variable more or less models the relative
frequency of English observations to Japanese observations". I am not sure I
am following. So what can I claim (or what do I know) based on this result?
3) As mentioned above, I do not have a specific hypothesis for Chinese. If I
want to include it in the analysis anyway (e.g., comparing passive
constructions in all three languages), what methods would you advise me to
use?

Thank you in advance, and I look forward to hearing from you.

Regards,
Zoe

On Tue, Jul 20, 2010 at 4:43 PM, Levy, Roger <rlevy@ucsd.edu> wrote:

> Dear Zoe,
>
> On Jul 19, 2010, at 10:28 AM, Zoe Luk wrote:
>
> > Dear R-lang users,
> >
> > I am new to the mailing list, and also rather new to R. I have a few
> questions concerning the results of glm().
> >
> > I am doing a study comparing the frequencies of different linguistic
> constructions used in a specific text that is in three languages (Japanese,
> Chinese, and English). The results I got are the following.
> >
> >
> > Transitive    Passive Intransitive    Adjectival      Others  Total
> > Japanese      164     9       291     36      8       508
> > Chinese       198     3       221     69      17      508
> > English       174     31      214     57      32      508
> >
> > 536   43      726     162     57      1524
> >
> > Chi-square test has a significant result. I intended to do further
> analysis to see if there is any difference among the languages, so i did the
> following:
> >
> > > glm.out4<-glm(freq~language*constructions, data=comps2.data,
> family=poisson, contrasts=list(language=contrastml,
> constructions=contrastmc))
>
> The first question I'd like to ask is why you're using a Poisson model to
> analyze your data.  I see that the marginal totals for each language are the
> same at 508.  Were these marginal totals under your control (e.g., did you
> count in each text until you got 508), or are these totals something you
> want your model to account for?  A Poisson model devotes parameters to
> accounting for the marginal totals. If instead you're thinking that the
> language is an "independent variable" and the construction type is a
> "dependent variable", then analyzing the data with multinomial logistic
> regression might be more appropriate.  (Now, there *are* legitimate uses of
> Poisson models as surrogates for multinomial logistic regression, but using
> them in surrogates in this way affects how you interpret the model
> parameters -- see below.)
>
> > > summary(glm.out4)
> >
> > Call:
> > glm(formula = freq ~ language * constructions, family = poisson,
> >     data = comps2.data, contrasts = list(language = contrastml,
> >         constructions = contrastmc))
> >
> > Deviance Residuals:
> >  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
> >
> > Coefficients:
> >                          Estimate Std. Error z value Pr(>|z|)
> > (Intercept)               3.93111    0.05890  66.746  < 2e-16 ***
> > language1                 0.20443    0.08435   2.424 0.015363 *
> > language2                 0.16064    0.09497   1.692 0.090740 .
> > constructions1           -1.25129    0.06779 -18.459  < 2e-16 ***
> > constructions2            1.68783    0.18775   8.990  < 2e-16 ***
> > constructions3           -0.01655    0.08647  -0.191 0.848205
> > constructions4            1.12805    0.13321   8.468  < 2e-16 ***
> > language1:constructions1  0.12190    0.09726   1.253 0.210090
> > language2:constructions1  0.26651    0.10562   2.523 0.011625 *
> > language1:constructions2  0.15838    0.24722   0.641 0.521744
> > language2:constructions2 -0.98403    0.32782  -3.002 0.002684 **
> > language1:constructions3 -0.15971    0.12915  -1.237 0.216218
> > language2:constructions3  0.44708    0.12620   3.543 0.000396 ***
> > language1:constructions4 -0.51918    0.21538  -2.411 0.015931 *
> > language2:constructions4  0.19079    0.18724   1.019 0.308207
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > (Dispersion parameter for poisson family taken to be 1)
> >
> >     Null deviance:  1.3744e+03  on 14  degrees of freedom
> > Residual deviance: -3.1086e-15  on  0  degrees of freedom
> > AIC: 116.66
> >
> > Number of Fisher Scoring iterations: 3
> >
> > > contrasts(language)
> >          [,1] [,2]
> > Chinese     0   -1
> > English     1    1
> > Japanese   -1    0
> > > contrasts(constructions)
> >              [,1] [,2] [,3] [,4]
> > Adjectival      0    0   -1    0
> > Intransitive    1    1    1    1
> > Others          0    0    0   -1
> > Passive         0   -1    0    0
> > Transitive     -1    0    0    0
> >
> > So my questions are:
> > (1) I am not sure how to interpret these results. Since language1 shows a
> significance difference, does it mean that "English and Japanese are
> significantly different in terms of the distribution of the different
> constructions used"?
>
> No -- since you're using Poisson regression, the language1 variable more or
> less models the relative frequency of English observations to Japanese
> observations.
>
> If you were to double the number of observations in each Japanese cell, the
> biggest change in your model would be that the language1 parameter would
> decrease.  (The intercept and the language2 parameter would also adjust by
> smaller amounts, in compensation.)  The constructions and
> language:constructions parameters would stay the same
>
> > (2) Does the "intercept" represent anything at all? If yes, what does it
> represent in this case?
>
> It probably not anything you're interested in.  Because you're using true
> contrasts (i.e. each column in your contrast matrices sums to zero), the
> intercept is more or less modeling the total number of observations in your
> dataset (keep in mind that Poisson regression is trying to model cell
> counts, not proportions).
>
> If you were to double the counts of all cells in your dataset, the
> intercept would increase by a constant factor -- log(2) -- and the rest of
> the model would stay the same.
>
> > (3) If I want to test whether English uses passive significantly more
> than Japanese, and Japanese uses intransitive significantly more than
> English, how should I modify the contrasts/commands?
>
> Let's call the passive question 3a, and the intransitive question 3b.
>  Answering these question depends on the answers to a couple of other
> questions:
>
> * How, if at all, are the Chinese data relevant to either 3a or 3b?
>
> * How, if at all, are the distinctions among adjectival, intransitive,
> transitive, and "other" relevant to 3a?
>
> * How, if at all, are the distinctions among adjectival, passive,
> transitive, and "other" relevant to 3b?
>
> If the answer to all three questions is "irrelevant", you might just
> consider doing very simple chi-squared or Fisher's exact tests on 2x2
> representations of the Japanese and English as (a) passive and non-passive
> counts, and (b) intransitive and non-intransitive counts.
>
> Also, I'd recommend Maureen Gillespie's coding tutorial as background
> reading:
>
>
> http://go2.wordpress.com/?id=725X1342&site=hlplab.wordpress.com&url=http%3A%2F%2Fhlplab.files.wordpress.com%2F2010%2F05%2Fcodingtutorial.pdf&sref=http%3A%2F%2Fhlplab.wordpress.com%2F2010%2F05%2F10%2Fmini-womm-montreal-slides-now-available%2F
>
> Best
>
> Roger
>
> --
>
> Roger Levy                      Email: rlevy@ling.ucsd.edu
> Assistant Professor             Phone: 858-534-7219
> Department of Linguistics       Fax:   858-534-4789
> UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy<http://ling.ucsd.edu/%7Erlevy>
>
>
>
>
>
>
>
>

-- 
Zoe Luk
Department of Linguistics
University of Pittsburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20100721/d38d257b/attachment-0001.html