[R-lang] Re: Investigating random slope variance

Fri Apr 4 08:48:32 PDT 2014

One small point to add about the shrinkage: you should see more shrinkage
for items with less data.  This is also a desired result (usually), since
items/subjects with relatively little data tend to have more extreme
results than they should. For example, think of a case where an item has
only two observations and both are "correct" when measuring accuracy. That
item would end up with an infinite logit accuracy, so a LOT of shrinkage is
a good thing!  Not sure if different amounts of data are driving where
you're seeing the most shrinkage in your data, but it's something to look
for.

The more general point is that lmer is estimating just the variance of the
random effect (i.e., how much item-level variance there seems to be), and
the BLUPs represent an attempt to characterize how the items fall within
this estimated variance -- they are not individually "estimated" from the
data in the same sense as the other parameters in the model.  If the "true"
effects of the individual items are distributed not at all normally, then
you could also see a kind of misfit between the empirical estimates/means
and the BLUPs (which are virtually guaranteed to be distributed normally,
AFAIK).

In any case, to the extent that you have a lot of data on each item,
similar amounts of data on each item, and to the extent that their effects
*are* distributed fairly normally, the BLUPs and empirical estimates will
be relatively close to each other.  If this is not the situation, things
can differ quite a lot between the empirical estimates and the BLUPs, but
this is typically a good thing, in terms of how we usually want to
interpret item effects.  If, when thinking about the design of your
items/regions, you have reason to believe that they should *not* show a
normal distribution of effects (maybe you believe there are two distinct
"types" of items/effect, or something), then you may need to resort to a
more flexible Bayesian estimation procedure, where you can specify the
expected (aka prior) distribution of the random effects to be something
other than normal.

I have nothing insightful to say about the intercept/slope correlations :-)

best,
-scott

On Fri, Apr 4, 2014 at 11:20 AM, Titus von der Malsburg
<malsburg@posteo.de>wrote:

>
> On 2014-04-04 Fri 05:10, T. Florian Jaeger <tiflo@csli.stanford.edu>
> wrote:
> > I would be careful making anything out of this. The BLUP estimates of the
> > random effects (and, I assume, their distribution) are affected by
> > shrinkage, which is often a desirable (conservative) feature, although it
> > will make differences appear smaller. So, it's not surprising that the
> > fixed effect model mirrors the empirical means more closely. That doesn't
> > mean though that it's the better model to draw conclusion from (about
> those
> > differences).
>
> Florian, your comment is spot on.  Here is a plot showing the effect of
> shrinkage in my data set:
>
>     http://users.ox.ac.uk/~sjoh3968/R/effect_of_shrinkage.png
>
> Unfilled circles show the empirical mean reading times and differences
> between conditions, one circle for each item.  The dots show the BLUP
> estimates for each item.
>
> The difference is fairly dramatic.  I assumed that shrinkage would pull
> all data points to the mean with the same force (I have the same amount
> of data for all items).  If that were the case, the ordering of items
> would be preserved.  However, shrinkage affects the individual items in
> quite different ways, and some items are even pushed away from the
> overall means (1, 5, 7, 8, 9, 10, 13, 14, 35) effectively expanding a
> subset of the estimates instead of shrinking them.
>
> I must say that I find it hard to swallow that two seemingly valid ways
> to analyze the data (item as random effect or fixed effect) yield
> results that are so different.
>
> Another observation: in the BLUP estimates, the correlation of
> intercepts and slopes seems to be much higher than in the raw data.  The
> correlation of the estimated random intercepts and slopes is -0.86. (The
> summary of the model reports -0.62.)  The correlation of the empirical
> item means and differences is only -0.4.  Why does lmer believe in such
> a high correlation?
>
>   Titus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucsd.edu/pipermail/ling-r-lang-l/attachments/20140404/f6183ea5/attachment.html