[R-lang] Re: utterance length

Fri Sep 3 02:23:05 PDT 2010

Dear Joost, and others,

On 2010.09.03 09:29 , Joost van de Weijer wrote:
> I want to use a mixed model to compare length (in words) of child-directed
> utterances with length of adult-directed utterances. The transcribed
> material that I have comes from different speakers. I have two questions
> that I would appreciate if anyone could give me feedback upon:
>
> 1. Does it make sense to use the transcribed utterance as a random factor,
> such that the model looks something like the following:
>
> lmer(nrwords~adressee+(1|speaker)+(1|utterance),corpus)

Yes this would make sense. Notice however that in this model you 
have crossed (as opposed to nested) the two random effects of 
speaker and utterance. That is only meaningful if at least some of 
the utterances have been produced by multiple speakers. I can 
imagine that that is indeed the case. If that is not the case, a 
better model would be
R> lmer(nrwords~adressee+(1|speaker/utterance),corpus)

Perhaps the following article is relevant (although modeling was 
done in MLwiN, not in R). Similar to your approach, I modeled the 
nrwords in the utterance, with speaker's gender as fixed effect.

H. Quené (2008). J.Acoust.Soc.Am. 123 (2), 1104-1113. 
[doi:10.1121/1.2821762].

>
> And if I then fail to find a significant addressee effect, is then the
> following conclusion justified: "the difference in utterance length between
> child-directed utterances and adult-directed utterances is due to the fact
> that the content of the child-directed utterances differs from that of the
> adult-directed utterances".

No. If there is no significant adressee effect, then there is *no* 
difference in utterance length between CDS and ADS in your data.

> 2. Is it advisable to use the square root (or the log) of the length
> variable rather than the actual number of words? How should I choose whether
> or not to transform?

R> require(MASS)
# functions from Venables & Ripley (1994). Modern Applied Statistics 
with S-Plus. Berlin: Springer. ISBN 0-387-94350-1.

R> boxcox( aux <- lm(nrwords~1,data=corpus) )

You can also extend the lm with additional predictors, but do not 
include any predictors or fixed effects related to your hypotheses.

Have a look at the resulting plot, and notice the lambda value where 
the curve is highest. Round off the lambda to about 1/3 or 1/2 
precision.

The best transformation, according to Venables & Ripley (1994, 
p.170) is y = x^lambda, a power transformation.
So if the curve peaks at about 1/2, use y = x^(1/2) = sqrt(x).
If the curve peaks at 1, use y=x, untransformed.
If the curve peaks at 0, use y = log(x), a special case.
If the curve peaks at -1, use y=1/x.
Etc.
Also see the boxplot documentation.
Make QQplots of various transformed DVs, and consider whether the 
transformed DV is still of interest to you. In particular, think 
about log and inverse transformations. Additive effects in 
log-transformed vars mean multiplicative effects in untransformed 
vars, for example.

Hope this helps! Best, Hugo Quené

-- 
Dr Hugo Quené | Assoc Prof in Phonetics | Dept Moderne Talen | 
Utrecht inst of Linguistics OTS | Universiteit Utrecht | Trans 10 | 
3512 JK Utrecht | The Netherlands | T +31 30 253 6070 | 
H.Quene@uu.nl | www.hugoquene.nl | www.hum.uu.nl | 
uu.academia.edu/HugoQuene