[R-lang] Re: confidence intervals for restricted-cubic spline

Mon Sep 20 00:48:07 PDT 2010

On Dec 25, 2009, at 6:42 PM, T. Florian Jaeger wrote:

> Hi,
> 
> does anybody know how to calculate the confidence intervals for restricted cubic splines, rcs(), based on the SEs of the coefficient estimates from an lmer (family="binomial") fit? We've been looking around to see whether our way of doing it makes any sense, but I couldn't find anything online.

Here is a much-belated response to this query -- I recently had cause to figure out how to do this, so I thought I might report it to the list.

The short story is that there is a direct analogy with the standard situation of computing a confidence interval for a single parameter -- call it b -- when a covariate x is assumed to have a linear effect on the linear predictor in a generalized linear model.  In this standard situation, the estimate of b is asymptotically normally distributed, and the standard deviation of this estimate is what we call the standard deviation of the predictor.  So we estimate the standard error, which we can call SE, and then use it to compute a confidence interval -- e.g., the width of a standard 95% confidence interval is

  ( qnorm(0.975) - qnorm(0.025) ) * SE

For a given value of x, say x=x_0, the size of the confidence interval representing x_0's contribution to the linear predictor would be 

  ( qnorm(0.975) - qnorm(0.025) ) * SE * x

In the case of splines, let's denote the spline-basis representation of x_0 as X, and the set of associated parameters as B.  The estimate of B is asymptotically multivariate normally distributed; let's call our estimate of the associated variance-covariance matrix Sigma.  The estimate of x_0's contribution is BX; because this is a linear combination of normally distributed random variables, its variance is

  t(X) %*% Sigma %*% X

We can take the square root of this quantity and treat it as the "standard error" of the estimated contribution of x at x_0.  (Note that in the unidimensional case, we could analogously write SE * x as sqrt(x * SE^2 * x).)  Applying this procedure for many different values of x allows us to construct a confidence interval for x's contribution to the linear predictor.

In practice, because of the way splines work, it's helpful to include the intercept in B, and to include an associated dummy-variable row of 1's in X.  Demonstration code appended.

Roger

--

Roger Levy                      Email: rlevy@ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy

*********

### preliminaries
library(lme4)
library(Design)
invlogit <- function(x) exp(x)/(1+exp(x))
set.seed(1)

### We'll use M groups of N observations each, x1 and x2 uniformly distributed in all cases
N <- 200
M <- 25

### let the underlying contribution x2 to the linear predictor be quadratic
f <- function(x) 0.25*(x-0.5)^2

### set up data
x1 <- rnorm(N*M,1,0.5)
x2 <- runif(N*M)
group.effects <- rnorm(M,0,sd=2)
group <- factor(paste("G",rep(1:M,1,each=N)))
intercept <- -0.5
y <- rbinom(N*M,1,invlogit(intercept + f(x1) + x2 + group.effects[group]))
df <- data.frame(x1=x1,x2=x2,y=y)

### fit the model
x1.rcs <- rcs(x1)
m <- lmer(y ~ x1.rcs + x2 + (1|group),df,family="binomial")

### transform the x1 axis into the spline basis representation
x1.axis <- seq(min(x1),max(x1),by=0.005)
tmp <- rcs(x1.axis,parms=attr(x1.rcs,"parms"))
x1.spline.basis <- matrix(tmp,dim(tmp)[1],dim(tmp)[2]) ## this is to make sure that everything is of the right class -- essential to get the matrix multiplication below to work

### extract the contribution of x1 from the model, and plot it
include.intercept <- TRUE ## included for pedagogical purposes; you can see how the graph turns out to be horribly non-intuitive if you set this to FALSE
k <- ifelse(include.intercept,1,2)
if(include.intercept)
  x1.spline.basis <- cbind(1,x1.spline.basis) ## add dummy for intercept
spline.params <- fixef(m)[k:5]
spline.Sigma <- vcov(m)[k:5,k:5]
mle.contribution <- x1.spline.basis %*% spline.params
ci.size <- qnorm(0.975)*sapply(1:length(mle.contribution),function(i) sqrt(t(x1.spline.basis[i,]) %*% as.matrix(spline.Sigma) %*% x1.spline.basis[i,]))
plot(mle.contribution ~ x1.axis,type="l",ylim=c(-2,2))
lines(mle.contribution + ci.size ~ x1.axis, lty=2) 
lines(mle.contribution - ci.size ~ x1.axis, lty=2) 

## now plot the true effect
if(include.intercept) {
  lines(f(x1.axis) + intercept ~ x1.axis, lty=3)
} else
  lines(f(x1.axis) ~ x1.axis, lty=3)
legend(0,2,c("MLE","95% CI","True effect"),lty=1:3)