[R-lang] Plotting CI from Mixed Effects Model
Roger Levy
rlevy at ucsd.edu
Sat Sep 29 16:47:15 PDT 2007
Hi Corey, Austin,
Corey McMillan wrote:
> Hi-
>
> I am reporting results from a mixed effects model in which Place (-,+) and
> Voice (-,+) are fixed effect factors.
>
> [...]
>
> My understanding is that the value for each cell in the 2 X 2 design can be
> calculated in the following way:
>
> place-,voice- = 1.7156
> place-,voice+ = 1.7516 + 0.9149 = 2.67
> place+, voice- = 1.7516 + 1.4482 = 3.20
> place+,voice+ = 1.7516 + 0.9149 + 1.4482 - 1.0889 = 3.03
This part is right.
> Now if I want to report the 95% CIs around those values can I use the same
> procedure? So the place-,voice+ lower CI would be equivalent to 0.9469 +
> 1.7156 and the higher CI equivalent to 1.943 + 1.7156, and so on...
This part isn't right. The confidence interval for place-,voice+ is a
confidence for the *difference* from the place-,voice- parameter value.
As a result your approach has two problems. First, it doesn't take
uncertainty about the intercept into account. Second, it doesn't
account for whatever correlation exists among your parameter estimates.
To show clearly why you can't just add the high and low bounds of the
confidence interval on your treatment parameter to the point estimate of
the intercept, I'll give a simple example involving a single two-level
factor. (As a side note, you were combining the point-estimate
intercept with Bayesian-derived confidence intervals; the two are
estimated in different ways and you shouldn't really mix the two. But
that's not crucial right now.)
set.seed(0)
b0 <- 10
b1 <- 3
s <- 5
n.a <- 20
n.b <- 50
a.observations <- rnorm(n.a,b0,s)
b.observations <- rnorm(n.b,b0+b1,s)
labs <- c(rep("a",n.a),rep("b",n.b))
x.a <- factor(labs)
x.b <- factor(labs,levels=c("b","a"))
y <- c(a.observations,b.observations)
lm.a <- lm(y ~ x.a)
lm.b <- lm(y ~ x.b)
Now, lm.a and lm.b are totally equivalent models -- the only difference
is that the default level is "a" for lm.a, and "b" for lm.b. As a
result, the estimate of the intercept (which I'll call b_0) in lm.b
should be equal to b_0 plus the estimate of the treatment effect (which
I'll call b_1) in lm.a, and vice versa. Let's look at lm.a [I truncated
the output a bit]:
> summary(lm.a)
Call:
lm(formula = y ~ x.a)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.991 1.040 9.603 2.78e-14 ***
x.ab 2.798 1.231 2.273 0.0262 *
So the intercept in lm.b should be 9.991 + 2.798 = 12.789. By your
method, the standard error of b_0 in lm.b would be 1.231. Now let's
take a look:
> summary(lm.b)
Call:
lm(formula = y ~ x.b)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.789 0.658 19.436 <2e-16 ***
x.ba -2.798 1.231 -2.273 0.0262 *
We were right about the intercept, but the standard error of b_0 is
actually *lower* than the standard error of either parameter estimate in
lm.a. Upon reflection, this makes sense: we have more data points for
level "b" than for level "a" and thus should be able to estimate the
effect more precisely for "b" than for "a". Furthermore, the treatment
effect being estimated is really the same (just a mirror image) in lm.a
and lm.b, so the standard error of this estimate should be the same in
the two models, and it is.
How can we relate the standard error of the intercepts in the two
models, and how does this relate to your original problem? When you
sample a population and do a linear regression on the sample, the
estimate of your parameter set -- call this b -- is (approximately)
normally distributed around the true parameters b*. This means that if
you wanted to construct a simultaneous confidence "interval" on multiple
parameter of the model, it would actually be a confidence *region* in
the form of an ellipsoid. The symmetric confidence intervals we look at
around the point estimate of a single parameter of the model have simply
marginalized out all the other model parameters.
Now from the perspective of lm.a, if you want to estimate a confidence
interval for the mean of the "b" treatment, then you are actually
interested in the quantity b_0 + b_1. You have constructed the point
estimate for this quantity correctly, but to construct a confidence
interval you would want to marginalize over the rest of the (b_0,b_1)
parameter space (this would be equivalent to marginalizing over values
of b_0 - b_1, which is perpendicular to b_0 + b_1). It's possible to
hack your way into lm.a and apply some math to do this, but the good
news is that the result would be precisely equal to the s.e. for b_0 in
lm.b!
So with that explanation, the recommendation I'd give you for what you
want to do is: rerun lmer four different ways, each time resetting the
contrasts for place and voice such that you cover the four logical
possibilities, and report the CI for each cell on the basis of the CIs
(either classical or HPD) obtained when that cell is the default.
Sorry this didn't come earlier, and I hope it is still useful for what
you're doing at the moment.
Best
Roger
--
Roger Levy Email: rlevy at ucsd.edu
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://ling.ucsd.edu/~rlevy
More information about the R-lang
mailing list