Comments on Fitting mixed models to categorical data in RTypePad2009-05-12T17:45:28ZBlog Administratorhttps://blog.revolutionanalytics.com/tag:typepad.com,2003:https://blog.revolutionanalytics.com/2009/05/fitting-mixed-models-to-categorical-data-in-r/comments/atom.xml/Agus commented on 'Fitting mixed models to categorical data in R'tag:typepad.com,2003:6a010534b1db25970b019affbb6787970c2013-10-02T13:36:03Z2013-10-02T13:36:03ZAgusDear David, Im new to predictive modelling with multinomial models. I would like your advice on the following (big) problem:...<p>Dear David,</p>
<p>Im new to predictive modelling with multinomial models. I would like your advice on the following (big) problem:</p>
<p>I want to predict the value of a variable in nature, call it "ecological success(ES)".</p>
<p>For doing that, I made some virtual simulations with which I reached a "best model"<br />
explaining ES. In the simulation, ES needs to be a categorical variable so I use a multinomial model with the gl.multi function to iteratively find the best combination of factors that explain that variable in the simulations.</p>
<p>The model appears in R like: ES~var1+var2:var3, so I have main terms alone and also interactions.</p>
<p>I can now fit this best model using the function "multinom" from nnet package and get the coefficients for each term in the model. something like:</p>
<p>M=multinom(ES~var1+var2:var3,data)</p>
<p>Now, in order to predict the values in nature I would naturally use the function predict from the same package and real data to feed the model, like:</p>
<p>predict.nnet(M,realdata)</p>
<p>However, this gives me only categorical values (as expected)with low discriminatory power. Would it be there a statistically valid way to obtain a continuous output? This is important because gives me more power to discriminate differences in ES among species. However simulate continuous ES is nearly impossible in my system.</p>
<p>For example, would it be valid to use a fitting function that assumes that ES is continuous at some point in the process (i.e. during the obtention of the best model, or during the obtention of the coefficients?)</p>
<p>There goes some reproducible example:</p>
<p>ES =as.factor( sample( c("0","1","2"), 100, replace=TRUE, prob=c(0.1, 0.2, 0.65) ))<br />
var1= dnorm(1:100, mean = 30, sd = 20, log = FALSE)<br />
var2= as.numeric(ES)-var1<br />
var3= (as.numeric(ES)-var1)/var2<br />
simulation=data.frame(cbind(ES,var1,var2,var3))</p>
<p>require(glmulti)<br />
require(nnet)<br />
multi.multi=function(formula, data){<br />
multinom(paste(deparse(formula)), data = data)# to compare models with different factors use true ML not REML<br />
}<br />
# find best model for Es in the simulation (may take days or not converge)<br />
M=glmulti(<br />
ES~var1*var2*var3,<br />
data=simulation, name = "glmulti.analysis",<br />
intercept = TRUE, marginality = FALSE,<br />
level = 2, minsize = 0, maxsize = -1, minK = -1, maxK = -1,<br />
fitfunction=multi.multi,<br />
method = "g", crit = "aic", confsetsize = 100,includeobjects=TRUE<br />
)</p>
<p># determine the coefficients for the best model<br />
M=multinom(ES~var1*var2*var3, data=simulation)<br />
summary(M)</p>
<p>#generate "real data"<br />
var1= dnorm(1:3, mean = 30, sd = 20, log = FALSE)<br />
var2= dnorm(1:3, mean = 10, sd = 20, log = FALSE)<br />
var3= dnorm(1:3, mean = 250, sd = 20, log = FALSE)<br />
realdata=data.frame(cbind(var1,var2,var3))</p>
<p>d=predict (M, realdata)# gives a lot of 1s, but want to discriminate ES finer</p>
<p># Would it be correct to estimate the best combination of factors using a permutation or mixed model fitting wrapped within the glmulti::glmulti function like:</p>
<p>require(lmPerm)</p>
<p>multi.multi=function(formula, data){<br />
lmp(paste(deparse(formula)), data = data)# to compare models with different factors use true ML not REML<br />
}</p>
<p># i am aware of using mixed models with MCMCglmm package, but dont see how apply it in a case of an ordinal categorical variable.</p>
<p>I# Would it be correct to use a permutation procedure for estimating the coefficients of the best obtained model after generating it via nnet::multinom ?</p>
<p>M=lmp(ES~var1*var2*var3, data=simulation)<br />
d=predict (M, realdata)# this gives a continuous ES output as desired, but with warning.</p>
<p><br />
Many thanks in advance!!<br />
<br />
</p>