[R-lang] Re: utterance length
Hugo Quené
H.Quene@uu.nl
Fri Sep 3 02:23:05 PDT 2010
Dear Joost, and others,
On 2010.09.03 09:29 , Joost van de Weijer wrote:
> I want to use a mixed model to compare length (in words) of child-directed
> utterances with length of adult-directed utterances. The transcribed
> material that I have comes from different speakers. I have two questions
> that I would appreciate if anyone could give me feedback upon:
>
> 1. Does it make sense to use the transcribed utterance as a random factor,
> such that the model looks something like the following:
>
> lmer(nrwords~adressee+(1|speaker)+(1|utterance),corpus)
Yes this would make sense. Notice however that in this model you
have crossed (as opposed to nested) the two random effects of
speaker and utterance. That is only meaningful if at least some of
the utterances have been produced by multiple speakers. I can
imagine that that is indeed the case. If that is not the case, a
better model would be
R> lmer(nrwords~adressee+(1|speaker/utterance),corpus)
Perhaps the following article is relevant (although modeling was
done in MLwiN, not in R). Similar to your approach, I modeled the
nrwords in the utterance, with speaker's gender as fixed effect.
H. Quené (2008). J.Acoust.Soc.Am. 123 (2), 1104-1113.
[doi:10.1121/1.2821762].
>
> And if I then fail to find a significant addressee effect, is then the
> following conclusion justified: "the difference in utterance length between
> child-directed utterances and adult-directed utterances is due to the fact
> that the content of the child-directed utterances differs from that of the
> adult-directed utterances".
No. If there is no significant adressee effect, then there is *no*
difference in utterance length between CDS and ADS in your data.
> 2. Is it advisable to use the square root (or the log) of the length
> variable rather than the actual number of words? How should I choose whether
> or not to transform?
R> require(MASS)
# functions from Venables & Ripley (1994). Modern Applied Statistics
with S-Plus. Berlin: Springer. ISBN 0-387-94350-1.
R> boxcox( aux <- lm(nrwords~1,data=corpus) )
You can also extend the lm with additional predictors, but do not
include any predictors or fixed effects related to your hypotheses.
Have a look at the resulting plot, and notice the lambda value where
the curve is highest. Round off the lambda to about 1/3 or 1/2
precision.
The best transformation, according to Venables & Ripley (1994,
p.170) is y = x^lambda, a power transformation.
So if the curve peaks at about 1/2, use y = x^(1/2) = sqrt(x).
If the curve peaks at 1, use y=x, untransformed.
If the curve peaks at 0, use y = log(x), a special case.
If the curve peaks at -1, use y=1/x.
Etc.
Also see the boxplot documentation.
Make QQplots of various transformed DVs, and consider whether the
transformed DV is still of interest to you. In particular, think
about log and inverse transformations. Additive effects in
log-transformed vars mean multiplicative effects in untransformed
vars, for example.
Hope this helps! Best, Hugo Quené
--
Dr Hugo Quené | Assoc Prof in Phonetics | Dept Moderne Talen |
Utrecht inst of Linguistics OTS | Universiteit Utrecht | Trans 10 |
3512 JK Utrecht | The Netherlands | T +31 30 253 6070 |
H.Quene@uu.nl | www.hugoquene.nl | www.hum.uu.nl |
uu.academia.edu/HugoQuene
More information about the ling-r-lang-L
mailing list