On the blog for the Human Language Processing lab at the University of Rochester you will find a detailed example of using the MCMCglmm package in R to fit a random-effects model to unordered multinomial data. Each line of code gets a detailed explanation, so this should be a useful tutorial for anyone looking to fit similar models to their own data sets. (Don't forget to check out the MCMCglmm tutorial, too.)

HLP/Jaeger lab blog: multinomial random effects models in R

Dear David,

Im new to predictive modelling with multinomial models. I would like your advice on the following (big) problem:

I want to predict the value of a variable in nature, call it "ecological success(ES)".

For doing that, I made some virtual simulations with which I reached a "best model"

explaining ES. In the simulation, ES needs to be a categorical variable so I use a multinomial model with the gl.multi function to iteratively find the best combination of factors that explain that variable in the simulations.

The model appears in R like: ES~var1+var2:var3, so I have main terms alone and also interactions.

I can now fit this best model using the function "multinom" from nnet package and get the coefficients for each term in the model. something like:

M=multinom(ES~var1+var2:var3,data)

Now, in order to predict the values in nature I would naturally use the function predict from the same package and real data to feed the model, like:

predict.nnet(M,realdata)

However, this gives me only categorical values (as expected)with low discriminatory power. Would it be there a statistically valid way to obtain a continuous output? This is important because gives me more power to discriminate differences in ES among species. However simulate continuous ES is nearly impossible in my system.

For example, would it be valid to use a fitting function that assumes that ES is continuous at some point in the process (i.e. during the obtention of the best model, or during the obtention of the coefficients?)

There goes some reproducible example:

ES =as.factor( sample( c("0","1","2"), 100, replace=TRUE, prob=c(0.1, 0.2, 0.65) ))

var1= dnorm(1:100, mean = 30, sd = 20, log = FALSE)

var2= as.numeric(ES)-var1

var3= (as.numeric(ES)-var1)/var2

simulation=data.frame(cbind(ES,var1,var2,var3))

require(glmulti)

require(nnet)

multi.multi=function(formula, data){

multinom(paste(deparse(formula)), data = data)# to compare models with different factors use true ML not REML

}

# find best model for Es in the simulation (may take days or not converge)

M=glmulti(

ES~var1*var2*var3,

data=simulation, name = "glmulti.analysis",

intercept = TRUE, marginality = FALSE,

level = 2, minsize = 0, maxsize = -1, minK = -1, maxK = -1,

fitfunction=multi.multi,

method = "g", crit = "aic", confsetsize = 100,includeobjects=TRUE

)

# determine the coefficients for the best model

M=multinom(ES~var1*var2*var3, data=simulation)

summary(M)

#generate "real data"

var1= dnorm(1:3, mean = 30, sd = 20, log = FALSE)

var2= dnorm(1:3, mean = 10, sd = 20, log = FALSE)

var3= dnorm(1:3, mean = 250, sd = 20, log = FALSE)

realdata=data.frame(cbind(var1,var2,var3))

d=predict (M, realdata)# gives a lot of 1s, but want to discriminate ES finer

# Would it be correct to estimate the best combination of factors using a permutation or mixed model fitting wrapped within the glmulti::glmulti function like:

require(lmPerm)

multi.multi=function(formula, data){

lmp(paste(deparse(formula)), data = data)# to compare models with different factors use true ML not REML

}

# i am aware of using mixed models with MCMCglmm package, but dont see how apply it in a case of an ordinal categorical variable.

I# Would it be correct to use a permutation procedure for estimating the coefficients of the best obtained model after generating it via nnet::multinom ?

M=lmp(ES~var1*var2*var3, data=simulation)

d=predict (M, realdata)# this gives a continuous ES output as desired, but with warning.

Many thanks in advance!!

Posted by: Agus | October 02, 2013 at 06:36