[R-lang] Re: new user's question about miltinom

Levy, Roger rlevy@ucsd.edu
Wed Feb 2 17:41:16 PST 2011


Hi Donald,

On Feb 1, 2011, at 5:49 PM, Donald Derrick wrote:

> Hello all.  I have a truly embarrassing question to ask:
> 
> I want to find a test in R that lets me look at the effect of an independent variable on a dependent result that can be either a 1, 2, or 3.  But alas I'm amazed how difficult it is to look at such multinomial data.
> 
> I have been trying to use the R function multinom() for some time, and I realize that I simply don't understand how it works.  
> 
> I know this because with a simple test table like this:
> 
> dependent	independent
> 1	2
> 1	2
> 1	2
> 1	2
> 1	2
> 1	2
> 2	2
> 2	2
> 2	2
> 3	2
> 
> I would expect this formula:
> 
> test.multinom = multinom(test$dependent ~ test$independent, data=test)
> 
> to utterly fail because all the independent data is the same.
> 
> But instead, I get this result:
> 
> /////
> 
> # weights:  9 (4 variable)
> initial  value 10.986123 
> final  value 8.979457 
> converged
> Call:
> multinom(formula = test$dependent ~ test$independent, data = test)
> 
> Coefficients:
>   (Intercept) test$independent
> 2  -0.1386294       -0.2772589
> 3  -0.3583519       -0.7167038
> 
> Residual Deviance: 17.95891 
> AIC: 21.95891 
> 
> ****
> 
> I'm clearly doing something ridiculously wrong, but I don't know what.

Technically speaking, multinom() is doing exactly what you are asking it to.  Your model is that 

  logit(p(outcome_class=i)) ~ a_i + beta_i * independent

where a_i is the intercept and beta_i the coefficient for the independent variable for the i-th outcome class, and a_1=b_1=0 (class 1 is the "baseline" class in the sense of Agresti, 2002, chapter 7).  You can check that the fitted coefficients predict exactly the proportions observed in your dataset:

***

> library(nnet)
> dat <- data.frame(dependent=c(rep(1,6),rep(2,3),3),independent=rep(2,10))
> m <- multinom(dependent ~ independent, dat)
# weights:  9 (4 variable)
initial  value 10.986123 
final  value 8.979457 
converged
> coef(m)
  (Intercept) independent
2  -0.1386294  -0.2772589
3  -0.3583519  -0.7167038
> p1 <- 1 / (1 + exp(sum(c(1,2) * coef(m)[1,])) + exp(sum(c(1,2) * coef(m)[2,])))
> p2 <- exp(sum(c(1,2) * coef(m)[1,])) / (1 + exp(sum(c(1,2) * coef(m)[1,])) + exp(sum(c(1,2) * coef(m)[2,])))
> p3 <- exp(sum(c(1,2) * coef(m)[2,])) / (1 + exp(sum(c(1,2) * coef(m)[1,])) + exp(sum(c(1,2) * coef(m)[2,])))
> c(p1,p2,p3)  ## probabilities of outcome classes 1, 2, 3 respectively
[1] 0.6 0.3 0.1

***

Thus the model's predictions are the relative frequency estimates of the three outcome classes; it is impossible for any model to do better than this given the data available!

What you're doing wrong, however, is that you are using more parameters (four) than needed to capture the complete distribution of your data, hence there are infinitely many combinations of parameter values that will give the relative frequency estimates.  Your model is "singular".  But multinom() has not been written with safeguards to catch this kind of situation and warn the user.  I give an example below of how the situation is different with lm() for linear regression:

***

> ## lm() has more safeguards than multinom()
> x <- rep(1,10)
> y <- x + rnorm(10)
> lm(y~x) ## x's coefficient is NA

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
     0.4753           NA  

> lm(y~x,singular.ok=FALSE) ## lm.fit() actually notices that the model is singular, and will quit if you ask it not to fit singular models
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  singular fit encountered

***

By default, lm() will not complain about a singular model, but at least the fitted model gives clear indication that there was not enough information to pin down the parameter estimates.  If you set the singular.ok flag to FALSE, then lm() actually will refuse to fit.

Hope that helps!

Best

Roger

--

Roger Levy                      Email: rlevy@ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://idiom.ucsd.edu/~rlevy












More information about the ling-r-lang-L mailing list