[R-lang] Re: Investigating random slope variance

Mon Apr 7 08:09:09 PDT 2014

On 2014-04-04 Fri 21:48, Levy, Roger <rlevy@ucsd.edu> wrote:
> On Apr 4, 2014, at 8:20 AM, Titus von der Malsburg <malsburg@posteo.de> wrote:
>> Here is a plot showing the effect of
>> shrinkage in my data set:
>> 
>>    http://users.ox.ac.uk/~sjoh3968/R/effect_of_shrinkage.png
>
> [...] But this graph looks a bit off — I don’t like the number of
> lines that cross 0.0 on the y axis.  Are you sure you’re not doing
> something subtle that makes the comparison not apples-to-apples, like
> taking means before log-transforming in computing the empirical means?

Hi Roger,

I took a shortcut when calculating the intercepts and slopes for the
random effects model.  By doing that I was indeed missing a part of
the action.  Here's a new plot that I generated using lme4's predict
function (code at the bottom of this mail):

    http://users.ox.ac.uk/~sjoh3968/R/effect_of_shrinkage2.png

This looks much more consistent with the characterization of the BLUPs
as weighted averages.  (The data points are vertically not centered on
zero because the predict function also added in the main effect of
condition, 0.06224.)

To remind you of the original question: I wanted to know which items are
read significantly faster or slower in the manipulated condition.  Based
on the BLUPs, these are items 25 and 36.  (According to the model that
uses item as a fixed effect these are 11, 23, 35, 36.)  Florian
explained why I should trust the BLUP estimates.  While I understand the
logic, I hesitate the believe in the BLUP results: The reason why 25
comes out as significant is that lmer believes in a very high
correlation of random intercepts and slopes (-0.62 according to the
summary, -0.86 in the BLUP estimates, but only -0.4 in the data).  For
this reason items 25 and 36 (which have rather extreme intercepts) are
shrunken much less than items 11 and 23 (which have average
intercepts).  So whether the conclusions from the BLUPs hold up depends
on whether the correlation between intercepts and slopes is really as
high as lmer thinks it is.  Without understanding how exactly lmer
derives these correlations I have some doubts, in part because from
experience I know that lmer tends to overestimate these correlations
particularly with sparse data.  Bottom line: It seems I can trust
neither the BLUP estimates nor the fixed-effect model.

  Titus

# Compiling item-level data:

# Descriptive stats:
x <- with(d, tapply(log(tt), list(item, cond), mean, na.rm=T))
items <- data.frame(item=1:40,
                    sample.mean=with(d, tapply(log(tt), item, mean, na.rm=T)),
                    sample.difference=x[,2] - x[,1])

# Results random effects model:
tt <- lmer(log(tt) ~ cond * Region.length + (1|subj) + (cond|item), d)
ptt <- predict(tt)
d$ptt[as.integer(names(ptt))] <- ptt
items$random.intercept <- with(d, tapply(ptt, item, mean, na.rm=T))
x                      <- with(d, tapply(ptt, list(item, cond), mean, na.rm=T))
items$random.slope    <- x[,2] - x[,1]

with(items,
     plot(sample.mean, sample.difference,
          main="Effect of shrinkage on random intercepts and slopes for items",
          xlab="mean reading time",
          ylab="difference between conditions"))

for (i in 1:39) {
  with(items, arrows(sample.mean[i], sample.difference[i],
                     random.intercept[i], random.slope[i],
                     length=0.1, col="darkgrey"))
}
with(items, points(random.intercept, random.slope, pch=20))
with(items, text(sample.mean+0.05, sample.difference, labels=1:39))
grid()
# "zero"-line at the level of the fixed effect of condition:
abline(fixef(tt)[2], 0)