[R-lang] contrast coding and lrm

J. Vogels J.Vogels@uvt.nl
Tue Aug 30 05:57:21 PDT 2011


Dear R users,

We created a logistic regression model to investigate the influence of different factors on Dutch word order variation. To avoid collinearity and to increase interpretability of the effects, we decided to center the predictors. For one predictor with two levels and one ordinal (5 point scale) predictor, we did this by subtracting the mean; for a third predictor with three levels, we used contrast coding. The levels of the latter predictor (called PP_TYPE_3) were coded as follows:

     [,1] [,2]
abs     1    0
loc     0    1
temp   -1   -1

The model summary gives us the following:

                  Coef     S.E.    Wald Z P     
Intercept         -0.26760 0.10303 -2.60  0.0094
cDEF3S            -0.27503 0.06216 -4.42  0.0000
cANIM2_S          -0.17869 0.18324 -0.98  0.3295
PP_TYPE_3=loc      0.60699 0.13471  4.51  0.0000
PP_TYPE_3=temp     0.08269 0.12773  0.65  0.5174
cDEF3S * cANIM2_S -0.38080 0.12298 -3.10  0.0020

The names suggest that the model gives the estimates for the levels 'loc' and 'temp', taking 'abs' as reference level. But this is not what we want, is it? Shouldn't the model have taken the columns of the contrast matrix as the recoded factors?

We are also a bit confused about the correct interpretation of the estimates of the three-level predictor. Do these represent the difference between one level and the intercept (=the grand mean), or between one level and the mean of the other levels of that predictor?

As a side question: is it correct to center the ordinal predictor the way we did?


Thanks for your help!

Best,
Jorrig Vogels
Geertje van Bergen



More information about the ling-r-lang-L mailing list