Hi Peter-ling,<br><br>can you say a bit more about your data? How many data points each are there within A-D, A-E, A-F, B-D, B-E, and B-F? If I had to guess I would say you have either more or less data in D than in E or F (depending on how you contrast coded)? Is that the way your data is unbalanced? A vs B seems balanced and I would even guess that the distribution of D vs. E vs. F  is the same in A and B, but the distribution of D vs. E vs. F overall is unbalanced?<br>


<br>Anyway, for certain distributions of data, you will find that contrast coding won&#39;t just solve the problem. If you play around with the following simulation, this becomes quite apparent:<br><br>====================================================<br>


# number of subjects<br>n_s = 20<br># number of items within subjects<br>n_per_s = 16<br><br># let&#39;s create an unbalanced data set of the type that I suspect you have<br># (see above)<br>AvB &lt;- as.factor(rep(c(rep(&quot;a&quot;, 8), rep(&quot;b&quot;, 8)), n_s))<br>


DvEvF &lt;- as.factor(rep(c(&quot;d&quot;,&quot;d&quot;,&quot;d&quot;, &quot;d&quot;, &quot;e&quot;, &quot;e&quot;, &quot;f&quot;, &quot;f&quot;), n_s * 2))<br># subjects<br>s  &lt;- sort(rep(seq(1:20), n_per_s))<br># subject effects<br>


rs &lt;- rnorm(n_s,0,0.5)<br><br># clumsy way to encode fixed effects<br>effect &lt;- function(x) {<br>    ifelse(x == &quot;a&quot;, 2,<br>     ifelse(x == &quot;b&quot;, 1.2,<br>      ifelse(x == &quot;d&quot;, 0.5,<br>


       ifelse(x == &quot;e&quot;, -1.5,<br>        ifelse(x == &quot;f&quot;, -0.3, NA)<br>        )<br>      )<br>     )<br>    )<br>}<br># outcome is specified by effects of AvB and DvEvF (no interaction, but<br># feel free to add one) plus noise plus subject effects (also noise).<br>


y &lt;- effect(AvB) + effect(DvEvF) + rnorm(n,0,0.2) + rs[s]<br><br>library(lme4)<br><br># treatment coding<br>contrasts(DvEvF) &lt;- contr.treatment(3)<br>lmer(y ~ AvB + DvEvF + (1 | s))<br><br># contrast (sum) coding<br>


contrasts(DvEvF) &lt;- contr.sum(3)<br>lmer(y ~ AvB + DvEvF + (1 | s))<br><br># or with interactions:<br>contrasts(AvB) &lt;- contr.sum(2)<br>contrasts(DvEvF) &lt;- contr.sum(3)<br>lmer(y ~ AvB * DvEvF + (1 | s))<br><br>

# contrast coding as used by you<br>

# to see that the above contrast coding does the same as<br># what you do<br>DvEvF &lt;- factor(DvEvF, levels=c(&quot;e&quot;, &quot;d&quot;, &quot;f&quot;))<br>contrasts(DvEvF) &lt;- contr.sum(3)<br>lmer(y ~ AvB + DvEvF + (1 | s))<br>


====================================================<br><br>this gives you something very similar to your pattern of fixed effect correlations. But play around with other parameters to get a feel.<br><br>what can you do now? First, many people would consider a fixed correlation of &lt; .5 to be reason for caution but not to disregard the results. In the corpus work I do, I usually try to keep fixed effect correlations much smaller, but I think folks find .5 acceptable, but maybe people can chime in and let you know whether I am off on that. <br>


<br>any method that further reduced collinearity will require some decisions (e.g. what to residualize against what; PCA; or some alternative coding). For example, helmert coding works quite well for the pseudo data set mentioned above, if you hypothesis was that e &lt; f &lt; d (and you resort the factor levels accordingly). Fixed effect correlations would all be &lt;0.26.<br>


<br>HTH,<br>Florian<br><br><div class="gmail_quote">On Thu, Oct 1, 2009 at 11:11 AM, Peter Graff <span dir="ltr">&lt;<a href="mailto:graff@mit.edu">graff@mit.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div bgcolor="#ffffff" text="#000000">

Dear R-Langs,<br>

<br>

I&#39;m currently trying to analyze some experimental data with a contrast

coded mixed model. I have a 2 (levels:A,B) x 3 (levels:D,E,F) design

with unbalanced cell sizes. I am coding the following variables:<br>

<br>

AvB, DvE, EvF<br>

<br>

where sum(AvB)=0, sum(DvE)=0, sum(EvF)=0. Next I fit this model:<br>

<br>

lmer(DV~AvB*(DvE+EvF)+(1|Subj)+(1|Item))<br>

<br>

And here is the correlation matrix it outputs:<br>

<br>

Correlation of Fixed Effects:<br>

                  (Intr)   AvB    DvE     EvF  AvB:DvE<br>

AvB         -0.029                             <br>

DvE          0.010 -0.006                      <br>

EvF         -0.002 -0.001  <b><font color="#cc0000">0.488</font>  </b>            

<br>

AvB:DvE -0.003  0.015 -0.039 -0.018        <br>

AvB:EvF -0.001 -0.002 -0.018 -0.044  <font color="#000099"><b>0.488 </b></font><br>

<br>

The question I have is, how to get rid of the collinearity in red and

blue and whether it&#39;s even possible. And if it&#39;s not possible, in what

way will this affect the reliability of my result?<br>

<br>

Thanks so much in advance,<br>

<br>

Peter<br>

</div>


<br>_______________________________________________<br>

R-lang mailing list<br>

<a href="mailto:R-lang@ling.ucsd.edu">R-lang@ling.ucsd.edu</a><br>

<a href="http://pidgin.ucsd.edu/mailman/listinfo/r-lang" target="_blank">http://pidgin.ucsd.edu/mailman/listinfo/r-lang</a><br>

<br></blockquote></div><br>