Numerical optimization is an important tool in the data scientist's toolbox. Many classical statistical problems boil down to finding the highest (or lowest) point on a multi-dimensional surface: the base R function optim provides many techniques for solving such maximum likelihood problems.

Counterintuitively, numerical optimizations are easiest (though rarely actually *easy*) when all of the variables are continuous and can take any value. When integer variables enter the mix, optimization becomes much, *much* harder. This typically happens when the optimization is constrained by a limited selection of objects, for example packages in a weight-limited cargo shipment, or stocks in a portfolio constrained by sector weightings and transaction costs. For tasks like these, you often need an algorithm for a specialized type of optimization: Mixed Integer Programming.

For problems like these, Dirk Schumacher has created the ompr package for R. This package provides a convenient syntax for describing the variables and contraints in an optimization problem. For example, take the classic "knapsack" problem of maximizing the total value of objects in a container subject to its maximum weight limit. In mathematical form, the problem can be described as an *objective* (quantity to be maximized) and a *constraint*:

$$

\begin{equation*}

\begin{array}{ll@{}ll}

\text{max} & \displaystyle\sum\limits_{i=1}^{n} v_{i}x_{i} & &\\

\text{subject to}& \displaystyle\sum\limits_{i=1}^{n} w_{i}x_{i} \leq W, & &\\

& x_{i} \in \{0,1\}, &i=1 ,\ldots, n&

\end{array}

\end{equation*}

$$

In ompr, the same problem is naturally expressed in R as follows:

model <- MIPModel() %>% add_variable(x[i], i = 1:n, type = "binary") %>% set_objective(sum_expr(v[i] * x[i], i = 1:n)) %>% add_constraint(sum_expr(w[i] * x[i], i = 1:n) <= W)

To actually solve the problem, you need to provide a "backend" solver algorithm to ompr. It's designed to integrate with any solver, and currently works with the ROI (R Optimization Infrastructure) package. ROI in turn provides a number of solver algorithms including GLPK, the GNU Linear Programming Kit, which you can use to solve problems like this.

Dirk provides a number of worked examples of the ompr package in use. Examples include the Traveling Salesman problem, solving Soduku puzzles, and selecting optimal warehouse locations to minimize delivery costs to customers.

The ompr package is available on Github. For more information, check out the ompr home page linked below.

Dirk Schumacher: ompr: Mathematical Programming in R

Soduku link should be https://dirkschumacher.github.io/ompr/articles/problem-sudoku.html

Posted by: Chris | December 19, 2016 at 16:29

Thanks Chris. I have fixed the Soduku link in the post.

Posted by: David Smith | December 20, 2016 at 09:32