by Mike Bowles
Mike Bowles is a machine learning expert and serial entrepreneur. This is the second post in what is envisioned as a four part series that began with Mike's Thumbnail History of Ensemble Models.
One of the main reasons for using R is the vast array of highquality statistical algorithms available in R. Ensemble methods provide a prime example. As statistics researchers have advanced the forefront in statistical learning, they have produced R packages that incorporate their latest techniques. The table below demonstrates this by compares several of the ensemble packages available in R.
Name 
Author 
Algorithms 
1st Vers Date 
Last Update 
Peters et al 
Bagging 
3/29/2002 
9/3/2013 

Alfaro et al 
AdaBoost and Bagging 
6/6/2006 
7/5/2012 

Culp et al 
AdaBoost + Friedman’s mods 
9/29/2006 
7/30/2010 

Breiman et al 
Random Forest 
4/1/2002 
1/6/2012 

Ridgeway et al 
Stochastic Gradient Boosting 
2/21/2003 
1/18/2013 

Hothorn 
RF with faster tree growing 
6/24/2005 
1/17/2014 

Hothorn 
Boosting appl to glm, gam 
6/16/2006 
2/8/2013 
Table 1. Ensemble packages available in R
The table gives the package name, the lead author and the basic contents of the package. The dates in the rightmost two columns are the date on the first version of the package and the date on the last version. The dates more or less track the development of development of these methods and the publication the corresponding papers in the area. The date for the last package update is provided to indicate how actively some of these packages are maintained and how active the field remains.
A number of these packages are worth having a look at, even though the methods they implement have been subsumed in other newer methods. For example ipred does bagging which has been incorporated into both Random Forest and Gradient Boosting. But the ipred package has the ability to incorporate more than one type of base learner. One of the examples in the package documentation incorporates Linear Discriminant Analysis in addition to Binary Decision Tree. It is hard to find ensemble methods using base learners other than binary decision trees. Simultaneously using two (or more) different base learners is singular to this package.
The randomForest algorithm wins the machine learning competitions and the R package was written by late Professor Leo Breiman of Berkeley. It contains the functionality that Prof Breiman describes in his papers. It solves regression and classification problems, has an unsupervised mode, produces marginal plots of prediction versus individual attributes, ranks attributes by importance. It also produces a similarity matrix measuring how frequently two rows from the input wind up in the same leaf node together. That gives a measure of how close the two rows are in their effect on the trained model.
The gbm package is heavily used and commercially important. It’s written by Greg Ridgeway and contributors. The package incorporates the methods outlined in Professor Jerome Friedman’s papers. Those include regression under mean square and mean absolute loss, binary classification under AdaBoost penalty and Bernoulli loss and multiclass classification. The package includes a number of extensions (Cox proportional hazard and pairwise ranking as examples). The gbm package includes similar visualization tools as randomForest. It will draw 2D or 3D plots showing marginal predicted values versus 1 or 2 of the attributes and gives a table ranking attributes by importance as a guide for feature engineering. (After loading the package, type example(gbm) at the console.)
The R packages party and mboost reflect continued development of ensemble methods. The party package uses an alternative method for training binary decision trees. The method is called conditional inference trees. The package authors describe in their associated paper how conditional inference trees^{1} reduce bias and reduce training time. In the party package, the authors use Breiman’s Random Forest procedure incorporating conditional inference trees as base learners.
The mboost package approaches generalized linear model and generalized additive model as boosting problems. The connection between boosting is described in Elements of Statistical Learning^{2}, Algorithm 16.1. If used for least squares regression then the method of taking base learners as being single attributes corresponds to Efron’s Least Angle Regression^{3} or Tibshirani’s Lasso regression^{4}. The package authors extend the method to apply to generalized linear model and generalized additive model.
Here’s an example of the sort of results these methods will produce. These results are for predicting the compressive strength of concrete based on ingredients in the concrete (water, cement, coarse aggregate, fine aggregate etc.). The data set comes from the UC Irvine Data Repository. The results come from gbm package (3000 trees, 10x crossvalidation, shrinkage=0.003). In Figure 1, going clockwise from the upper left are plots of the progress of training (green line is outofsample performance and black line is insample performance), relative importance of the various ingredients in predicting compressive strength, and the marginal changes in predicted strength as functions of fine aggregate and water. As the figures show modern ensemble methods are far from being black boxes. Besides delivering predictions, they deliver a significant amount of information about the character of their predictions.
Figure 1 – Outputs from gbm Model for UCI Compressive Strength of Concrete
References
 http://statmath.wuwien.ac.at/~zeileis/papers/Hothorn+Hornik+Zeileis2006.pdf
 Hastie, Tibshirani and Friedman Elements of Statistical Learning, 2^{nd} edition, Springer 2009
 http://www.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf
 http://statweb.stanford.edu/~tibs/lasso.html