[R-lang] MCMCglmm with relatively small multinomial data set

Wed Sep 23 23:42:25 PDT 2009

Hello R-lang,

My name is Donald Derrick.

Along with my supervisor Bryan Gick, I am researching categorical  
kinematic variation in flaps and taps in English, and have identified  
four forms:

1) Alveolar taps - where the tongue come from below the alveolar  
ridge, hits it and returns to a lower position
2) Down flaps - where the tongue comes from above the alveolar ridge,  
hits it and continues downward
3) Up flaps - where the tongue comes from below the alveolar ridge,  
hits it and continues upward
4) Post-alveolar taps - where the tongue comes from above the alveolar  
ridge, goes straight forward, hits a spot at or above the alveolar  
ridge and comes back.

In words/phrases with only 1 flap, the independent variables are:

1) Subject variability (of course)
2) tongue speed (which I have measured)
3) tongue tip position before and after the flap

The tongue tip is always low with regular vowels, but it can be either  
high for rhotic vowels (tongue tip up bunched 'r' or retroflex 'r'),  
or low (tongue tip down bunched 'r')

This leaves the factors of

3a) tongue tip position immediately before the flap
3b) tongue tip position immediately after the flap

If I use more traditional balanced statistics (like a Friedman test)  
to compare rhotic and non-rhotic vowels, I cannot add in the details  
of type of rhotic vowel (up or down) because different subjects use  
different tongue positions and I no longer have balanced data.

Also, the Friedman test is ordinal, so I have to argue for a ranking  
of flaps.
[An easy ranking is tongue height favoring end position, where I get  
1) (low-low) alveolar tap, 2) (high-low) down flap, 3) (low-high) up- 
flap and 4) (high-high) post-alveolar-tap.  It's a bit bogus, but then  
so is every other ordinal ranking scheme I've ever seen!]

For both these reasons, I think I should be using a mixed effects  
linear model.

Unfortunately, It seems I cannot use the glmer() command here because  
my dependent variable is multinomial

SO, I am tring to use MCMCglmm, but I don't know if I am using it  
right, and also I cannot get convergence.  I need help.

My first question is, should I even bother?

I have small datasets by the standards of many of you, and since they  
took an entire year to build and need to graduate someday, I will  
never get bigger ones.  When I say small, I mean 18 subjects, and  
typically 12-24 phrases for any given category of analysis, as little  
as 216 phrases in each category, and as many as 432.

On the other hand, these are typical of my datasets in that the fixed  
predictors are hugely predictive.

SO, my next question is, can I even use MCMCglmm and hope to ever get  
convergence?

Now, if all is not hopeless, I will go on:

To make it easy to help, I've provided code I am using based on the  
HLP/Jaeger blog, and ask detailed questions throughout:

///////////

prior = NULL
flap.glmm = NULL
k <- length(levels(flap$firstflaptext))
I <- diag(k-1)
J <- matrix(rep(1, (k-1)^2), c(k-1, k-1))

# k = the number of levels for type of flap.  This is 4 - as described  
above.  I coded them "U", "D", "H", and "L"

    prior <- list(R = list(fix=1, V=0.5 * (I + J), n = 3),
    G = list(
    G1 = list(V = diag(3), n = 3),
    G2 = list(V = diag(4), n = 4)
    ))

# R = the standard setup for a "categorical" family test based on the  
HLP/Jaeger lab blog help

# fix=1 I believe is standard for a test of this kind

# n=3 I believe is the lowest we should use.  Higher numbers seem to  
work better, but not good enough.
# I do not know why :(

# G1 is for the flap types
# G2 is for the vowel context.  (more later)

for(i in c(1:3))

# I run the same model 3 times for covariance testing - I may want to  
try different priors,
# but I barely understand how to build them and cannot even get  
convergance this way yet.

{	
    flap.glmm[[i]] <-MCMCglmm(firstflaptext ~ -1 + trait + context,
    random = ~us(trait):subject + us(context):subject,
    rcov = ~us(trait):units,
    prior = prior,
    family="categorical",
    burnin = 5000,
    nitt = 50000,
    data=flap,verbose=TRUE)
}

# firstflaptext ~ -1 + trait + context
# I would love to be able to use something like
# precedingContext * followingContext
# - but every time I tried I got a "bad prior" error and absolutely no  
hint as to how to fix it.
# So I just combined the preceding and following context into a single  
variable - this gave me
# VV (for "autumn"), VR (for "otter"), RV (for "Berta"), and RR (for  
"murder")
# This solution does not seem ideal, so any help would be appreciated

# random = ~us(trait):subject + us(context):subject
# This seemed standard based on on the HLP/Jaeger lab blog help

# nitt = minimum for raftery test below

flap.coda <- mcmc.list(
chain1=mcmc(flap.glmm[[1]]$Sol),
chain2=mcmc(flap.glmm[[2]]$Sol),
chain3=mcmc(flap.glmm[[3]]$Sol))

# I built a mcmc list based on the fixed effects
# Q: How do I do this for the random effects, or does it matter?

flap.gelman <- gelman.diag(flap.coda, transform = TRUE)
flap.raftery <- raftery.diag(flap.coda, q=0.025, r=0.005, s=0.95,  
converge.eps=0.001)

# Two covariance tests.
# Below are the results

 > flap.gelman

Potential scale reduction factors:

                      Point est. 97.5% quantile
traitfirstflaptext.H       1.05           1.13
traitfirstflaptext.L       1.03           1.09
traitfirstflaptext.U       1.03           1.07
contextRV                  1.27           1.79
contextVR                  2.41           4.45
contextVV                  2.29           4.14

Multivariate psrf

3.38

# Note, these numbers do not get better with higher burnin or nitt :(

 > flap.raftery

$chain1

Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95

                       Burn-in  Total Lower bound  Dependence
                       (M)      (N)   (Nmin)       factor (I)
  traitfirstflaptext.H 3        4212  3746         1.12
  traitfirstflaptext.L 3        4368  3746         1.17
  traitfirstflaptext.U 4        4615  3746         1.23
  contextRV            25       22940 3746         6.12
  contextVR            4        4968  3746         1.33
  contextVV            8        10200 3746         2.72

$chain2

Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95

                       Burn-in  Total Lower bound  Dependence
                       (M)      (N)   (Nmin)       factor (I)
  traitfirstflaptext.H 3        4449  3746         1.19
  traitfirstflaptext.L 2        3776  3746         1.01
  traitfirstflaptext.U 2        3987  3746         1.06
  contextRV            25       31880 3746         8.51
  contextVR            15       13680 3746         3.65
  contextVV            6        8844  3746         2.36

$chain3

Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95

                       Burn-in  Total Lower bound  Dependence
                       (M)      (N)   (Nmin)       factor (I)
  traitfirstflaptext.H 3        4061  3746         1.08
  traitfirstflaptext.L 3        4368  3746         1.17
  traitfirstflaptext.U 3        4289  3746         1.14
  contextRV            15       16749 3746         4.47
  contextVR            18       18096 3746         4.83
  contextVV            15       15918 3746         4.25

## In this run, the third chain suggested convergance.  I'm not sure  
it is real though :(
# Especially since it has never happened before, and higher burnin or  
nitt tend to not help.

//////

Any and all help is appreciated - and apologies for such a hideously  
long post!