*by Derek McCrae Norton, Senior Sales Engineer*

I often get questions in the course of a day about various algorithms. Some of those are already in Revolution R Enterprise, and some just haven't been written yet. One such algorithm is Ridge Regression. For those who are wondering "What is that?": put simply, it is a type of regression that can help deal with multicollinearity. It is part of a broader class of models called Penalized Regression, which also includes LASSO. For those who want more than that, I recommend your friendly neighborhood search engine. Here is one wikipedia link to get you started.

Since this algorithm is not yet implemented in RevoScaleR, I set out to determine how hard such a task might be... A nice surprise was that it is not difficult at all. After browsing one of my old textbooks, I found that you can calculate a ridge regression by using correlation matricies. Jackpot! RevoScaleR has that implemented so then it was just a matter of working through a bit of code.

Initially I was stumped when I looked at lm.ridge since the coefficients don't match, but then after delving into that code I was able to see that it is simply a different method of scaling. So, here is some code to do ridge regression in RevoScaleR. The two models match the example in *Kutner, Nachtsheim and Neter.*

rxRidgeReg <- function(formula, data, lambda, ...) { myTerms <- all.vars(formula) newForm <- as.formula(paste("~", paste(myTerms, collapse = "+"))) myCor <- rxCovCor(newForm, data = data, type = "Cor", ...) n <- myCor$valid.obs k <- nrow(myCor$CovCor) - 1 bridgeprime <- do.call(rbind, lapply(lambda,

function(l) qr.solve(myCor$CovCor[-1,-1] + l*diag(k),

myCor$CovCor[-1,1]))) bridge <- myCor$StdDevs[1] * sweep(bridgeprime, 2,

myCor$StdDevs[-1], "/") bridge <- cbind(t(myCor$Means[1] -

tcrossprod(myCor$Means[-1], bridge)), bridge) rownames(bridge) <- format(lambda) return(bridge) }

bodyfat <- read.table("http://calcnet.mth.cmich.edu/org/spss/V16_materials/DataSets_v16/BodyFat-TxtFormat.txt") names(bodyfat) <- c("X1", "X2", "X3", "Y") library(ridge) rr1 <- rxRidgeReg(Y ~ X1 + X2 + X3, data = bodyfat, lambda = 0.02, reportProgress = 0) rr2 <- linearRidge(Y ~ X1 + X2 + X3, data = bodyfat, lambda = 0.02) rr1 ## X1 X2 X3 ## 0.02 -7.403 0.5554 0.3681 -0.1916 coef(rr2) ## (Intercept) X1 X2 X3 ## -7.4034 0.5554 0.3681 -0.1916 # it also works with multiple lambdas myLambda <- c(0, 0.02, 1, 5, 10) rxRidgeReg(Y ~ X1 + X2 + X3, data = bodyfat, lambda = myLambda, reportProgress = 0) ## X1 X2 X3 ## 0.00 117.085 4.33409 -2.85685 -2.186060 ## 0.02 -7.403 0.55535 0.36814 -0.191627 ## 1.00 -2.249 0.28438 0.30246 -0.008316 ## 5.00 10.242 0.12188 0.12457 0.017907 ## 10.00 14.340 0.07122 0.07206 0.013252

Created by Pretty R at inside-R.org

## Comments

You can follow this conversation by subscribing to the comment feed for this post.