Yet Another Math Programming Consultant: NEOS and ROI: R Optimization Infrastructure Package

ROI [1] is an R package to express optimization problems (in a rather low-level format). Its main advantage is that you can use the same model against a number of solvers. Linear models are expressed in a matrix notation. I am not a big fan of this: it restricts the applicability to small, simple models only. For larger, more complex models the matrix approach leads quickly to unreadable and unmaintainable models.

Solver interfaces are implemented as "plug-ins". A new plugin [2] allows to run solvers on NEOS [3]. This gives us access to a number of powerful solvers, without the need to install things locally.

Linear models

To illustrate how we can express a simple transportation model in ROI, let's try a very simple transportation model [4]:

$\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min\>&\sum_{i,j} c_{i,j} x_{i,j}\\&\sum_j x_{i,j} \le a_i\>\>\forall i\\&\sum_i x_{i,j} \ge b_j\>\>\forall j\\&x_{i,j}\ge 0\end{align}}$
The GAMS representation is:

Set
   i 'canning plants' / seattle,  san-diego /
   j 'markets'        / new-york, chicago, topeka /;

Parameter
   a(i) 'capacity of plant i in cases'
        / seattle     350
          san-diego   600  /

   b(j) 'demand at market j in cases'
        / new-york    325
          chicago     300
          topeka      275  /;

Table d(i,j) 'distance in thousands of miles'
               new-york       chicago        topeka
   seattle          2.5           1.7           1.8
   san-diego        2.5           1.8           1.4;

Scalar f 'freight in dollars per case per thousand miles' / 90 /;

Parameter c(i,j) 'transport cost in thousands of dollars per case';
c(i,j) = f * d(i,j) / 1000;

Variable
   x(i,j) 'shipment quantities in cases'
   z      'total transportation costs in thousands of dollars';

Positive Variable x;

Equation
   cost      'define objective function'
   supply(i) 'observe supply limit at plant i'
   demand(j) 'satisfy demand at market j';

cost..        z  =e=  sum((i,j), c(i,j)*x(i,j));

supply(i)..   sum(j, x(i,j)) =l= a(i);

demand(j)..   sum(i, x(i,j)) =g= b(j);

Model transport / all /;

solve transport using lp minimizing z;

display x.l, x.m;

Note that all vectors and matrices have strings as indices. The ordering does not matter (like in a relational database table). Now let's do the same for ROI.

#
# data
#
n <- 2  # number of plants (sources)
m <- 3  # number of markets (distinations)
si <- c('seattle', 'san-diego')
sj <- c('new-york', 'chicago', 'topeka')
a <- c(350, 600)      # capacity
b <- c(325, 300, 275) # demand
# distances
d <- matrix(c(2.5, 1.7, 1.8, 2.5, 1.8, 1.4), nrow=n, ncol=m, byrow=TRUE)
f <- 90   # unit freight cost
c <- d*f/1000  # unit transportation cost

# flatten c row wise
cflat <- as.vector(t(c))
# create constraint matrix .... somewhat torturous code
nvar <- n * m;
A <- numeric(0)
varnames <- character(0);
for (i in 1:n) {
   row <- rep(0,nvar)
   for (j in 1:m)  {
     row[(i-1)*m+j] <- 1
     varnames <- c(varnames,paste(si[i],sj[j],sep=":"))
   }
   A<-rbind(A,row)
}
for (j in 1:m) {
  row <- rep(0,nvar)
  for (i in 1:n) 
    row[(i-1)*m+j] <- 1
  A<-rbind(A,row)
}
rhs <- c(a,b)
dir <- c(rep('<=',n),rep('>=',m))

LP <- OP(L_objective(cflat,names=varnames),
         L_constraint(A,dir,rhs),
         max=FALSE)

(sol <- ROI_solve(LP, solver = "glpk"))

I just used vectors with integer indices: i.e. the position determines the meaning of a number (in other words: the ordering is important). We could have used vectors/matrices with row and column names, although I am not sure if this would make things much more readable.

The results look like:

> (sol <- ROI_solve(LP, solver = "glpk"))
Optimal solution found.
The objective value is: 1.536750e+02
> solution(sol)
  seattle:new-york    seattle:chicago     seattle:topeka san-diego:new-york  san-diego:chicago   san-diego:topeka 
                50                300                  0                275                  0                275

When reading the GAMS code, you can follow this is about a transportation model. The R code is much more obscure, it is close to write-only code. I don't understand how you can implement large and/or complex models in this way.

The R code builds up a "large" matrix

$A$ . For this tiny example it is small, but in general it is very large: in practice even way too large to handle even medium sized models. A model with 10,000 equations and 10,000 variables (not very large by today's standards) leads to a 10,000 x 10,000 matrix with 100 million entries! This type of matrix based interface essentially brings us back to the days of matrix generators [7] (although they paid attention to sparsity). This technology is largely obsolete.

NEOS

To use a NEOS solver we just need to change the last line:

> (sol <- ROI_solve(LP, solver = "neos", method = "mosek"))
Optimal solution found.
The objective value is: 1.536750e+02

For some solvers you need to provide an e-mail address:

In this case, add email="...".

Behind the scenes

Interestingly, under the hood, a GAMS model is generated and sent to NEOS. It looks like:

> model_call <- ROI_solve(LP, solver = "neos", method = "mosek",
+                         dry_run = TRUE)
> cat(as.list(model_call)$xmlstring)
<?xml version="1.0" encoding="UTF-8"?>
<document>
  <category>milp</category>
  <solver>MOSEK</solver>
  <inputMethod>GAMS</inputMethod>
  <model><![CDATA[Option IntVarUp = 0;

Set i / R1*R5 / ;
Set ileq(i) / R1, R2 / ;
Set igeq(i) / R3, R4, R5 / ;
Set j / C1*C6 / ;

Parameter objL(j)
/C1 0.225
C2 0.153
C3 0.162
C4 0.225
C5 0.162
C6 0.126/ ;

Parameter rhs(i)
/R1 350
R2 600
R3 325
R4 300
R5 275/ ;

Parameter A
/R1.C1 1
R3.C1 1
R1.C2 1
R4.C2 1
R1.C3 1
R5.C3 1
R2.C4 1
R3.C4 1
R2.C5 1
R4.C5 1
R2.C6 1
R5.C6 1/;

Variables obj;
Positive Variables x(j);


Equations
    ObjSum
    LinLeq(ileq)
    LinGeq(igeq);

ObjSum .. obj =e= sum(j, x(j) * objL(j)) ;
LinLeq(ileq) .. sum(j, A(ileq, j) * x(j)) =l= rhs(ileq) ;
LinGeq(igeq) .. sum(j, A(igeq, j) * x(j)) =g= rhs(igeq) ;

Model LinearProblem /all/ ;

Solve LinearProblem using LP minimizing obj ;

option decimals = 8;

display '---BEGIN.SOLUTION---', x.l, '---END.SOLUTION---';


file results /results.txt/;
results.nw = 0;
results.nd = 15;
results.nr = 2;
results.nz = 0;
put results;
put 'solution:'/;
loop(j, put, x.l(j)/);
put 'objval:'/;
put LinearProblem.objval/;
put 'solver_status:'/;
put LinearProblem.solvestat/;
put 'model_status:'/;
put LinearProblem.modelstat/;]]></model>
  <options><![CDATA[]]></options>
  <gdx><![CDATA[]]></gdx>
  <wantgdx><![CDATA[]]></wantgdx>
  <wantlog><![CDATA[]]></wantlog>
  <comments><![CDATA[]]></comments>
</document>
>

All structure is lost: we are just modeling:

$\begin{align} \min \>& c^Tx \\ & A_1 x \le b_1 \\& A_2 x \ge b_2 \end{align}$

Quadratic models

This plugin also allows quadratic models (but not general nonlinear models). In [2] a trivial non-convex QP model is presented:

$\begin{align}\max\>&F(P_F - 150) + C(P_C - 100) \\ & 2 F + C \le 500\\ & 2F + 3C \le 800 \\ & F \le 490 - P_F \\ & C \le 640 - 2P_C\\ & F, C, P_F, P_C\ge 0\end{align}$ ROI does not understand mathematics, so we have to pass this on in terms of matrices:

Q <- simple_triplet_matrix(i = 1:4, j = c(2, 1, 4, 3), rep(1, 4))
as.matrix(Q)

var_names <- c("full", "price_full", "compact", "price_compact")
o <- OP(
  Q_objective(Q = Q, L = c(-150, 0, -100, 0), names = var_names),
  L_constraint(rbind(c(2, 0, 1, 0), c(2, 0, 3, 0),
                     c(1, 1, 0, 0), c(0, 0, 1, 2)),
               dir = leq(4), rhs = c(500, 800, 490, 640)),
  maximum = TRUE)

(sol <- ROI_solve(o, solver = "neos", method = "BARON"))

Unfortunately, this is not really close to the mathematical model and makes the problem setup difficult to read and understand.

In [2] a table is presented with some of the supported solvers:

Linear models: OMPR

For linear models, we can use the OMPR package [5,6] on top of ROI. This will give us more readable models. Here we implement the above transportation model. The model can look like:

library(ompr)
library(ompr.roi)
library(dplyr)

#
# data
#
n <- 2  # number of plants (sources)
m <- 3  # number of markets (distinations)
si <- c('seattle', 'san-diego')
sj <- c('new-york', 'chicago', 'topeka')
a <- c(350, 600)      # capacity
b <- c(325, 300, 275) # demand
# distances
d <- matrix(c(2.5, 1.7, 1.8, 2.5, 1.8, 1.4), nrow=n, ncol=m, byrow=TRUE)
f <- 90   # unit freight cost
c <- d*f/1000  # unit transportation cost

#
# model + solve
#
(sol <- MIPModel() %>%
  add_variable(x[i,j],i=1:n,j=1:m,type = "continuous", lb = 0) %>%
  set_objective(sum_expr(c[i,j]*x[i,j],i=1:n,j=1:m), "min")  %>%
  add_constraint(sum_expr(x[i,j],j=1:m)<=a[i], i=1:n)  %>%
  add_constraint(sum_expr(x[i,j],i=1:n)>=b[j], j=1:m) %>%
  solve_model(with_ROI(solver = "neos", method="mosek")))  

#  
# solution
#
sol %>% 
  get_solution(x[i, j]) %>%
  filter(value > 0)

This code at least has some resemblance to the mathematical model. The solution looks like:

Status: optimal
Objective value: 153.675

  variable i j value
1        x 1 1    50
2        x 2 1   275
3        x 1 2   300
4        x 2 3   275

We can clean up the solution a bit, so we get a more meaningful solution report:

> sol %>% 
+   get_solution(x[i, j]) %>%
+   filter(value > 0) %>%
+   mutate(Plant = si[i],Market = sj[j]) %>%
+   select(Plant,Market,Shipment=value)
      Plant   Market Shipment
1   seattle new-york       50
2 san-diego new-york      275
3   seattle  chicago      300
4 san-diego   topeka      275

But..

OMPR is much better than matrix oriented tools to express models. That does not mean OMPR is suited for all LP/MIP models. For one thing it can be slow, there is no good support for large,sparse structures and there are some bugs. Here is a recent example of a bug I encountered.

In many cases we may need to restrict the domain of variables. E.g. instead of all x[i,j], only use a subset. An example is a network of nodes and arcs. The arcs can be represented by x[i,j] indicating the flow from node i to node j. Of course, if we have a sparse network only a few of all (i,j) combinations are allowed.

OMPR allows conditions on the domain of variables. E.g. for a one-dimensional variable x[i], we can do:

> n <- 3

> ok <- c(1,0,1)

> ok

[1] 1 0 1

> (sol <- MIPModel() %>%

+ add_variable(x[i], i=1:n, ok[i]==1)

+ )

Mixed integer linear optimization problem

Variables:

Continuous: 2

Integer: 0

Binary: 0

No objective function.

Constraints: 0

This looks great. We see x[2] is not included in the variables. Now let's do the same thing for a two dimensional variable x[i,j]:

> n <- 3

> ok <- matrix(c(0,1,1, 0,0,1, 0,0,0),nrow=3,ncol=3,byrow=T)

> ok

     [,1] [,2] [,3]

[1,]    0    1    1

[2,]    0    0    1

[3,]    0    0    0

> (sol <- MIPModel() %>%

+     add_variable(x[i,j], i=1:n, j=1:n, ok[i,j]==1)

+ )

Error in validate_quantifier_candidates(candidates, zero_vars_msg) :

  only_integer_candidates are not all TRUE

Something goes wrong here. The error message is not very informative: I have no clue what it means. I also have no idea how to fix this.

References

Many Solvers, One Interface, ROI, the R Optimization Infrastructure Package, https://www.r-project.org/conferences/useR-2010/slides/Theussl+Hornik+Meyer.pdf
Ronald Hochreiter and Florian Schwendinger, ROI Plug-in NEOS, https://cran.r-project.org/web/packages/ROI.plugin.neos/vignettes/ROI.plugin.neos_Introduction.pdf
NEOS Server: State-of-the-Art Solvers for Numerical Optimization, https://neos-server.org/neos/
Matlab vs GAMS: linear programming, http://yetanothermathprogrammingconsultant.blogspot.com/2016/09/matlab-vs-gams-linear-programming.html
Mixed Integer Linear Programming in R, https://dirkschumacher.github.io/ompr/
MIP model in R using the OMPR package, http://yetanothermathprogrammingconsultant.blogspot.com/2017/04/mip-model-in-r-using-ompr-package.html
Robert Fourer, Modeling Languages versus Matrix Generators for Linear Programming, ACM Transactions on Mathematical Software, vol. 9, pp. 143–183, 1983.

Yet Another Math Programming Consultant

Friday, June 22, 2018

NEOS and ROI: R Optimization Infrastructure Package