« Visualizing New York subway ridership, 1905-2006 | Main | Parallelized Backtesting with foreach »

May 13, 2009


Feed You can follow this conversation by subscribing to the comment feed for this post.

Dear David,

Im new to predictive modelling with multinomial models. I would like your advice on the following (big) problem:

I want to predict the value of a variable in nature, call it "ecological success(ES)".

For doing that, I made some virtual simulations with which I reached a "best model"
explaining ES. In the simulation, ES needs to be a categorical variable so I use a multinomial model with the gl.multi function to iteratively find the best combination of factors that explain that variable in the simulations.

The model appears in R like: ES~var1+var2:var3, so I have main terms alone and also interactions.

I can now fit this best model using the function "multinom" from nnet package and get the coefficients for each term in the model. something like:


Now, in order to predict the values in nature I would naturally use the function predict from the same package and real data to feed the model, like:


However, this gives me only categorical values (as expected)with low discriminatory power. Would it be there a statistically valid way to obtain a continuous output? This is important because gives me more power to discriminate differences in ES among species. However simulate continuous ES is nearly impossible in my system.

For example, would it be valid to use a fitting function that assumes that ES is continuous at some point in the process (i.e. during the obtention of the best model, or during the obtention of the coefficients?)

There goes some reproducible example:

ES =as.factor( sample( c("0","1","2"), 100, replace=TRUE, prob=c(0.1, 0.2, 0.65) ))
var1= dnorm(1:100, mean = 30, sd = 20, log = FALSE)
var2= as.numeric(ES)-var1
var3= (as.numeric(ES)-var1)/var2

multi.multi=function(formula, data){
multinom(paste(deparse(formula)), data = data)# to compare models with different factors use true ML not REML
# find best model for Es in the simulation (may take days or not converge)
data=simulation, name = "glmulti.analysis",
intercept = TRUE, marginality = FALSE,
level = 2, minsize = 0, maxsize = -1, minK = -1, maxK = -1,
method = "g", crit = "aic", confsetsize = 100,includeobjects=TRUE

# determine the coefficients for the best model
M=multinom(ES~var1*var2*var3, data=simulation)

#generate "real data"
var1= dnorm(1:3, mean = 30, sd = 20, log = FALSE)
var2= dnorm(1:3, mean = 10, sd = 20, log = FALSE)
var3= dnorm(1:3, mean = 250, sd = 20, log = FALSE)

d=predict (M, realdata)# gives a lot of 1s, but want to discriminate ES finer

# Would it be correct to estimate the best combination of factors using a permutation or mixed model fitting wrapped within the glmulti::glmulti function like:


multi.multi=function(formula, data){
lmp(paste(deparse(formula)), data = data)# to compare models with different factors use true ML not REML

# i am aware of using mixed models with MCMCglmm package, but dont see how apply it in a case of an ordinal categorical variable.

I# Would it be correct to use a permutation procedure for estimating the coefficients of the best obtained model after generating it via nnet::multinom ?

M=lmp(ES~var1*var2*var3, data=simulation)
d=predict (M, realdata)# this gives a continuous ES output as desired, but with warning.

Many thanks in advance!!

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr