[R-lang] Re: coding in unbalanced data

Philip Hofmeister phofmeister@ucsd.edu
Fri Sep 3 14:38:23 PDT 2010


Hi Bruno,

First, unbalanced data in LME models will not lead to fixed effect
correlations, certainly not with the number of data points that
reading time studies yield. The ability of LME models to handle
unbalanced data is one of the things that makes it so great.

>
>> lmer(logRTresidual~condition.coded * canimacy * corder_of_the_sentence
>> +cSpillOver1+cSpillOver2+cSpillOver3 + csex +chand+cage+(1|subj) + (1|word)
>> ,data=averb)

Two, until you understand your data, I wouldn't advise using residual
log reading times. They can be very useful, but coefficients are hard
to interpret, and it's generally good practice to start with raw or
length-residualized reading times.

> (c means centered, and I think the names are pretty straightforward). I have
> a fixed effect "condition" with 4 levels: B, D0, D1,D2. My data was balanced
> until I removed the RTs smaller than 200 and bigger than 1100. But, now I
> see it isn't balanced:
>
>> prop.table(table(CNPCC$condition))
>
>         B        D0        D1        D2
>
> 0.1496947 0.2280904 0.2889916 0.3332233
>
> I want to use contrast coding (or Helmert coding or orthogonal coding, is it
> the same?). I want to check B vs D0,D1 and D2; D0 vs D1 and D2; and D1 vs
> D2.
>

What you describe is Helmert coding. Did you actually change the
coding of your condition levels? Once factors have been Helmert coded,
correlations tend to disappear.

> When I did the same kind of experiment with judgments (and balanced data) I
> used this:
>
>>CNPCC$cond.co <-CNPCC$condition
>
>>contrasts(CNPCC$cond.cod) <-cbind("B-D0"= c(3,-(1),-(1),-(1)),"D0-D1.D2"=
>> c(0,(2),-(1),-(1)),"D1-D2"= c(0,0,(1),-(1)))
>
> But now I'm seeing larger correlations of fixed effects and I suspect that
> it may be because of the coding (everything else is centered). How do I do
> this kind of coding to unbalanced data?

You didn't specify which factors are correlated so it's hard to say
exactly what's going on. You can look here for some general coding
tips in R (and also just use the R help library):

http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm

I vaguely recall some reference to the above UCLA page in a previous
posting to this list, so you might wanna search the archives of this
list, too.

Good luck,

Philip

>
> Thanks!
>
> Bruno Nicenboim
>



More information about the ling-r-lang-L mailing list