[R-lang] utterance length

Fri Sep 3 00:29:41 PDT 2010

Dear list members,
I want to use a mixed model to compare length (in words) of child-directed
utterances with length of adult-directed utterances. The transcribed
material that I have comes from different speakers. I have two questions
that I would appreciate if anyone could give me feedback upon:

1. Does it make sense to use the transcribed utterance as a random factor,
such that the model looks something like the following:

lmer(nrwords~adressee+(1|speaker)+(1|utterance),corpus)

And if I then fail to find a significant addressee effect, is then the
following conclusion justified: "the difference in utterance length between
child-directed utterances and adult-directed utterances is due to the fact
that the content of the child-directed utterances differs from that of the
adult-directed utterances".

2. Is it advisable to use the square root (or the log) of the length
variable rather than the actual number of words? How should I choose whether
or not to transform?

With many thanks in advance for any feedback that you might have!

Joost van de Weijer
Lund University