[R-lang] Re: lmer multiple comparisons for interaction between continuous and categorical predictor

Wed Nov 21 13:21:02 PST 2012

Hi Dan,

I feel a little responsible for unleashing multcomp on the group, so
I'll try to field this one.

First, let me make sure I understand what comparisons you're
interested in.  So for Diet1, there's an effect of Time, and you want
to know whether the effect of Time is greater for Diet2, Diet3, or
Diet4 (compared to Diet1). And then furthermore, whether the effect of
Time for Diet2 is greater than for Diet3, and all the pairwise
comparisons between the effects of Time for different Diets.  So at
the end of the day, you can say something like "Diet 3 showed the
biggest effect of growth over time, significantly different from all
the other Diets.  Diet 2 and 4 were both better than Diet 1, but Diet
4 was not significantly more effective than Diet 2."  Right?

Second, you probably know this already, but just to be clear: if you
keep the default "treatment" contrast coding of Diet when you fit your
model m, you already get all the pairwise comparisons between Diet1
and the others.

Here's the fixed effect table I get when I run your code:

Fixed effects:
            Estimate Std. Error t value
(Intercept)  31.5081     5.9108   5.331
Time          6.7130     0.2573  26.089
Diet2        -2.8745    10.1910  -0.282
Diet3       -13.2577    10.1910  -1.301
Diet4        -0.3983    10.1998  -0.039
Time:Diet2    1.8961     0.4267   4.443
Time:Diet3    4.7099     0.4267  11.037
Time:Diet4    2.9495     0.4323   6.824

So this basically this linear model says that for Diet1, for every
unit of Time that changes, you get a change of 6.713 in weight.  The
Time:Diet2 effect says that this is 1.896 greater for Diet2, meaning
that in Diet2, every unit of Time results in a change of (6.713 +
1.896 =) 8.609 in weight.  The model you've already fitted gives you a
std error and a t-value, and a t of 4.44 is quite a bit larger than
needed to reach p <.05.

The point is that you can do the math for getting the comparison
estimates quite easily, without need for the multcomp package.  And if
you think about it a little more, it's just as easy to get the other
comparisons, like Diet2 vs Diet4.

Here's how, if it's not immediately obvious.  The effect of Time:Diet2
is essentially how much bigger the estimate of Time is for Diet2
compared to how big it is for Diet1.  In other words: (Time | Diet2) -
(Time | Diet1).  Similarly, the effect of Time:Diet4 in your initial
model is (Time | Diet4) - (Time | Diet1).  So if you subtract those
two estimates from each other, the (Time | Diet1)'s cancel out.  In
other words:

Time:Diet2 = (Time | Diet2) - (Time | Diet1)   ## just definitional
Time:Diet4 = (Time | Diet4) - (Time | Diet1)   ## definitional
(Time | Diet2) - (Time | Diet4) = [(Time | Diet2) - (Time | Diet1)] -
[(Time | Diet4) - (Time | Diet1)]   ## this is just pointing out that
x - y = (x - z) - (y - z)
(Time | Diet4) - (Time | Diet2) = Time:Diet4 - Time:Diet2     ##
substitution on the right side of the previous equation, using the
definitions above

It's one of those things that seems so obvious once you get it, but
always takes people (myself included) a long time to get initially.
The upshot is that if you want to know how much bigger the effect of
Time is for Diet4 compared to the effect of Time for Diet2, you can
get it just by subtracting the Time:Diet2 interaction estimate from
your initial model from the Time:Diet4 estimate: 2.95 - 1.90 = 1.05.

In order to get the standard error (and t) for each of these
comparisons, you have a couple of options.  One is to do it "manually"
by re-leveling the factor, re-fitting the model, and looking at the
output.  So if you re-level Diet so that Diet2 is the "base" level,
then you get estimates for Time:Diet1, Time:Diet3, and Time:Diet4, and
the Time:Diet4 estimate will be the 1.05 result we got by just doing
subtraction above (but with std err and t value).

But this is kind of a pain in the rear, and since otherwise the model
is no different, it would be nicer to be able to extract these things
without this re-leveling hassle.  That's one thing the multcomp
package is good for, since you can get as many comparisons as you
want, all from your one model fit.  If you try both methods, you'll
notice that the std errors and t-values are identical, whether you're
using multcomp or doing the re-leveling trick.

The second thing multcomp does is "correct" p-values. I'll be honest
and say I haven't taken the time to understand exactly how this works,
but at first blush it seems like a reasonable sort of correction
(i.e., less conservative than a Bonferroni correction).  But in the
case of continuous effects using lmer, I *really* don't know where
it's getting its p-values, since most people these days are hesitant
to get p-values from Gaussian lmer, unless they can use Baayen's MCMC
method (the 'pvals.fnc' function).  So it's probably getting pretty
anti-conservative p-values to start, and then correcting those to some
degree.  As always, dig into the code itself (and/or read their paper
where they explain the method) if this is a feature you really want to
understand better -- I haven't yet myself.

But the point is that because their method corrects p-values, and
doesn't modify std errors, you can kind of pick and choose. So if you
just want a convenient way to get a bunch of comparison estimates and
std errors, and use some other method for establishing significance,
you can.

Sorry if all this is obvious or overly pedantic, I just felt compelled
to be as clear as possible.

When you go to actually run the code, the above discussion about
subtracting interaction estimates to get what you want is relevant.
Here's code you can use to specify the contrast matrix directly,
instead of messing around with the Tukey method stuff.

contrast.matrix <- rbind(  "Time:Diet1 vs. Time:Diet2" =  c(0, 0, 0,
0, 0, 1, 0, 0),
                           "Time:Diet1 vs. Time:Diet3" =  c(0, 0, 0,
0, 0, 0, 1, 0),
                           "Time:Diet1 vs. Time:Diet4" =  c(0, 0, 0,
0, 0, 0, 0, 1),
                           "Time:Diet2 vs. Time:Diet3" =  c(0, 0, 0,
0, 0, -1, 1, 0),
                           "Time:Diet2 vs. Time:Diet4" =  c(0, 0, 0,
0, 0, -1, 0, 1),
                           "Time:Diet3 vs. Time:Diet4" =  c(0, 0, 0,
0, 0, 0, -1, 1))

then:

summary(glht(m, contrast.matrix))

Gets you what I think you're looking for.  Here's how the contrast
matrix works:  each row vector in the matrix represents weights for
the effects you get in the default model output, starting with the
intercept.  So if you want an effect that's already included in the
basic print(m) output, you just put a "1" in the position
corresponding to that effect.  The trickier bit is getting more
complex things, but the logic from above makes this straightforward,
too.  If you want to compare the effect of Time for Diet2 vs. the
effect of Time for Diet4, that's equivalent to just subtracting the
two interactions, so you code one of those "1" and the other "-1".  If
this feels weird, you can always just play around with it and use the
re-leveling kludge to double-check if you're unsure.

In the end, I just want to emphasize that you can use the glht
function this way as a convenience function to get a bunch of
estimates (and std errors) you could otherwise get by changing which
level of the factor is the "base" level and re-fitting the model (or
by using subtraction is you just wanted the estimate). But glht ALSO
applies a new method for correcting p-values for multiple comparisons.
If you don't want that part, you can just ignore the p-values in the
output.

Hope this helps!

-scott

On Wed, Nov 21, 2012 at 9:20 AM, Dan Mirman <daniel.mirman@gmail.com> wrote:
> Dear list,
> A few weeks ago there was a discussion on this list about how to use
> multcomp to do comparisons between different levels for an interaction. In
> that case it was an interaction between two categorical factors, but it got
> me thinking about how to do this for an interaction between a continuous and
> a categorical predictor. I've been puzzling over it for a while and haven't
> been able to figure it out, so I'm coming back to the list in the hope that
> you can help me out.
>
> As an example, I've been using the ChickWeight data set in R. I fit the
> model like this:
>
> m <- lmer(weight ~ Time * Diet + (1 | Chick), data=ChickWeight, REML=F)
>
>
> Where Time is continuous and Diet is categorical (with 4 levels). The chicks
> started out at (approximately) the same weight, but (for this example) I
> want to know compare individual diets to test if differently affect growth
> rate. The following works for getting the pairwise comparisons for the
> intercept effect of diet:
>
> summary(glht(m, linfct=mcp(Diet = "Tukey")))
>
>
> but I can't figure out the analogous thing for the Time:Diet interaction. I
> tried putting the interaction into mcp, but that produced an error:
>
> summary(glht(m, linfct=mcp('Time:Diet' = "Tukey")))
> Error in summary(glht(m, linfct = mcp(`Time:Diet` = "Tukey"))) :
>   error in evaluating the argument 'object' in selecting a method for
> function
>  'summary': Error in mcp2matrix(model, linfct = linfct) :
> Variable(s) ‘Time:Diet’ have been specified in ‘linfct’ but cannot be found
> in ‘model’!
>
>
> Thanks in advance,
> Dan
>
> --
> -----------------------------------------------------
> Dan Mirman
> Institute Scientist
> Moss Rehabilitation Research Institute
> Elkins Park, PA 19027
> http://www.danmirman.org
> -----------------------------------------------------