Friday, June 22, 2018

NEOS and ROI: R Optimization Infrastructure Package

ROI [1] is an R package to express optimization problems (in a rather low-level format). Its main advantage is that you can use the same model against a number of solvers. Linear models are expressed in a matrix notation. I am not a big fan of this: it restricts the applicability to small, simple models only. For larger, more complex models the matrix approach leads quickly to unreadable and unmaintainable models.

Solver interfaces are implemented as "plug-ins". A new plugin [2] allows to run solvers on NEOS [3]. This gives us access to a number of powerful solvers, without the need to install things locally.

Linear models


To illustrate how we can express a simple transportation model in ROI, let's try a very simple transportation model [4]: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min\>&\sum_{i,j} c_{i,j} x_{i,j}\\&\sum_j x_{i,j} \le a_i\>\>\forall i\\&\sum_i x_{i,j} \ge b_j\>\>\forall j\\&x_{i,j}\ge 0\end{align}}\]
The GAMS representation is:


Set
   i 'canning plants' / seattle,  san-diego /
   j 'markets'        / new-york, chicago, topeka /;

Parameter
   a(i) 'capacity of plant i in cases'
        / seattle     350
          san-diego   600  /

   b(j) 'demand at market j in cases'
        / new-york    325
          chicago     300
          topeka      275  /;

Table d(i,j) 'distance in thousands of miles'
               new-york       chicago        topeka
   seattle          2.5           1.7           1.8
   san-diego        2.5           1.8           1.4;

Scalar f 'freight in dollars per case per thousand miles' / 90 /;

Parameter c(i,j) 'transport cost in thousands of dollars per case';
c(i,j) = f * d(i,j) / 1000;

Variable
   x(i,j) 'shipment quantities in cases'
   z      'total transportation costs in thousands of dollars';

Positive Variable x;

Equation
   cost      'define objective function'
   supply(i) 'observe supply limit at plant i'
   demand(j) 'satisfy demand at market j';

cost..        z  =e=  sum((i,j), c(i,j)*x(i,j));

supply(i)..   sum(j, x(i,j)) =l= a(i);

demand(j)..   sum(i, x(i,j)) =g= b(j);

Model transport / all /;

solve transport using lp minimizing z;

display x.l, x.m;

Note that all vectors and matrices have strings as indices. The ordering does not matter (like in a relational database table). Now let's do the same for ROI.


#
# data
#
n <- 2  # number of plants (sources)
m <- 3  # number of markets (distinations)
si <- c('seattle', 'san-diego')
sj <- c('new-york', 'chicago', 'topeka')
a <- c(350, 600)      # capacity
b <- c(325, 300, 275) # demand
# distances
d <- matrix(c(2.5, 1.7, 1.8, 2.5, 1.8, 1.4), nrow=n, ncol=m, byrow=TRUE)
f <- 90   # unit freight cost
c <- d*f/1000  # unit transportation cost

# flatten c row wise
cflat <- as.vector(t(c))
# create constraint matrix .... somewhat torturous code
nvar <- n * m;
A <- numeric(0)
varnames <- character(0);
for (i in 1:n) {
   row <- rep(0,nvar)
   for (j in 1:m)  {
     row[(i-1)*m+j] <- 1
     varnames <- c(varnames,paste(si[i],sj[j],sep=":"))
   }
   A<-rbind(A,row)
}
for (j in 1:m) {
  row <- rep(0,nvar)
  for (i in 1:n) 
    row[(i-1)*m+j] <- 1
  A<-rbind(A,row)
}
rhs <- c(a,b)
dir <- c(rep('<=',n),rep('>=',m))

LP <- OP(L_objective(cflat,names=varnames),
         L_constraint(A,dir,rhs),
         max=FALSE)

(sol <- ROI_solve(LP, solver = "glpk"))

I just used vectors with integer indices: i.e. the position determines the meaning of a number (in other words: the ordering is important). We could have used vectors/matrices with row and column names, although I am not sure if this would make things much more readable.

The results look like:


> (sol <- ROI_solve(LP, solver = "glpk"))
Optimal solution found.
The objective value is: 1.536750e+02
> solution(sol)
  seattle:new-york    seattle:chicago     seattle:topeka san-diego:new-york  san-diego:chicago   san-diego:topeka 
                50                300                  0                275                  0                275 


When reading the GAMS code, you can follow this is about a transportation model. The R code is much more obscure, it is close to write-only code. I don't understand how you can implement large and/or complex models in this way.

The R code builds up a "large" matrix \(A\). For this tiny example it is small, but in general it is very large: in practice even way too large to handle even medium sized models. A model with 10,000 equations and 10,000 variables (not very large by today's standards) leads to a 10,000 x 10,000 matrix with 100 million entries! This type of matrix based interface essentially brings us back to the days of matrix generators [7] (although they paid attention to sparsity). This technology is largely obsolete.

NEOS


To use a NEOS solver we just need to change the last line:


> (sol <- ROI_solve(LP, solver = "neos", method = "mosek"))
Optimal solution found.
The objective value is: 1.536750e+02

For some solvers you need to provide an e-mail address:


In this case, add email="...".

Behind the scenes


Interestingly, under the hood, a GAMS model is generated and sent to NEOS. It looks like:


> model_call <- ROI_solve(LP, solver = "neos", method = "mosek",
+                         dry_run = TRUE)
> cat(as.list(model_call)$xmlstring)
<?xml version="1.0" encoding="UTF-8"?>
<document>
  <category>milp</category>
  <solver>MOSEK</solver>
  <inputMethod>GAMS</inputMethod>
  <model><![CDATA[Option IntVarUp = 0;

Set i / R1*R5 / ;
Set ileq(i) / R1, R2 / ;
Set igeq(i) / R3, R4, R5 / ;
Set j / C1*C6 / ;

Parameter objL(j)
/C1 0.225
C2 0.153
C3 0.162
C4 0.225
C5 0.162
C6 0.126/ ;

Parameter rhs(i)
/R1 350
R2 600
R3 325
R4 300
R5 275/ ;

Parameter A
/R1.C1 1
R3.C1 1
R1.C2 1
R4.C2 1
R1.C3 1
R5.C3 1
R2.C4 1
R3.C4 1
R2.C5 1
R4.C5 1
R2.C6 1
R5.C6 1/;

Variables obj;
Positive Variables x(j);


Equations
    ObjSum
    LinLeq(ileq)
    LinGeq(igeq);

ObjSum .. obj =e= sum(j, x(j) * objL(j)) ;
LinLeq(ileq) .. sum(j, A(ileq, j) * x(j)) =l= rhs(ileq) ;
LinGeq(igeq) .. sum(j, A(igeq, j) * x(j)) =g= rhs(igeq) ;

Model LinearProblem /all/ ;

Solve LinearProblem using LP minimizing obj ;

option decimals = 8;

display '---BEGIN.SOLUTION---', x.l, '---END.SOLUTION---';


file results /results.txt/;
results.nw = 0;
results.nd = 15;
results.nr = 2;
results.nz = 0;
put results;
put 'solution:'/;
loop(j, put, x.l(j)/);
put 'objval:'/;
put LinearProblem.objval/;
put 'solver_status:'/;
put LinearProblem.solvestat/;
put 'model_status:'/;
put LinearProblem.modelstat/;]]></model>
  <options><![CDATA[]]></options>
  <gdx><![CDATA[]]></gdx>
  <wantgdx><![CDATA[]]></wantgdx>
  <wantlog><![CDATA[]]></wantlog>
  <comments><![CDATA[]]></comments>
</document>
> 

All structure is lost: we are just modeling:\[\begin{align} \min \>& c^Tx \\ & A_1 x  \le b_1 \\&  A_2 x \ge b_2 \end{align}\]

Quadratic models


This plugin also allows quadratic models (but not general nonlinear models). In [2] a trivial non-convex QP model is presented: \[\begin{align}\max\>&F(P_F - 150) + C(P_C - 100) \\ & 2 F + C \le 500\\ & 2F + 3C \le 800 \\ & F \le 490 - P_F \\ & C \le 640 - 2P_C\\ & F, C, P_F, P_C\ge 0\end{align}\] ROI does not understand mathematics, so we have to pass this on in terms of matrices:


Q <- simple_triplet_matrix(i = 1:4, j = c(2, 1, 4, 3), rep(1, 4))
as.matrix(Q)

var_names <- c("full", "price_full", "compact", "price_compact")
o <- OP(
  Q_objective(Q = Q, L = c(-150, 0, -100, 0), names = var_names),
  L_constraint(rbind(c(2, 0, 1, 0), c(2, 0, 3, 0),
                     c(1, 1, 0, 0), c(0, 0, 1, 2)),
               dir = leq(4), rhs = c(500, 800, 490, 640)),
  maximum = TRUE)

(sol <- ROI_solve(o, solver = "neos", method = "BARON"))


Unfortunately, this is not really close to the mathematical model and makes the problem setup difficult to read and understand.

In [2] a table is presented with some of the supported solvers:


Linear models: OMPR


For linear models, we can use the OMPR package [5,6] on top of ROI. This will give us more readable models. Here we implement the above transportation model. The model can look like:


library(ompr)
library(ompr.roi)
library(dplyr)

#
# data
#
n <- 2  # number of plants (sources)
m <- 3  # number of markets (distinations)
si <- c('seattle', 'san-diego')
sj <- c('new-york', 'chicago', 'topeka')
a <- c(350, 600)      # capacity
b <- c(325, 300, 275) # demand
# distances
d <- matrix(c(2.5, 1.7, 1.8, 2.5, 1.8, 1.4), nrow=n, ncol=m, byrow=TRUE)
f <- 90   # unit freight cost
c <- d*f/1000  # unit transportation cost

#
# model + solve
#
(sol <- MIPModel() %>%
  add_variable(x[i,j],i=1:n,j=1:m,type = "continuous", lb = 0) %>%
  set_objective(sum_expr(c[i,j]*x[i,j],i=1:n,j=1:m), "min")  %>%
  add_constraint(sum_expr(x[i,j],j=1:m)<=a[i], i=1:n)  %>%
  add_constraint(sum_expr(x[i,j],i=1:n)>=b[j], j=1:m) %>%
  solve_model(with_ROI(solver = "neos", method="mosek")))  

#  
# solution
#
sol %>% 
  get_solution(x[i, j]) %>%
  filter(value > 0) 

This code at least has some resemblance to the mathematical model. The solution looks like:


Status: optimal
Objective value: 153.675

  variable i j value
1        x 1 1    50
2        x 2 1   275
3        x 1 2   300
4        x 2 3   275

We can clean up the solution a bit, so we get a more meaningful solution report:

> sol %>% 
+   get_solution(x[i, j]) %>%
+   filter(value > 0) %>%
+   mutate(Plant = si[i],Market = sj[j]) %>%
+   select(Plant,Market,Shipment=value)
      Plant   Market Shipment
1   seattle new-york       50
2 san-diego new-york      275
3   seattle  chicago      300
4 san-diego   topeka      275

But..


OMPR is much better than matrix oriented tools to express models. That does not mean OMPR is suited for all LP/MIP models. For one thing it can be slow, there is no good support for large,sparse structures and there are some bugs. Here is a recent example of a bug I encountered.

In many cases we may need to restrict the domain of variables. E.g. instead of all x[i,j], only use a subset. An example is a network of nodes and arcs. The arcs can be represented by x[i,j] indicating the flow from node i to node j. Of course, if we have a sparse network only a few of all (i,j) combinations are allowed.

OMPR allows conditions on the domain of variables. E.g. for a one-dimensional variable x[i], we can do:

> n <- 3
> ok <- c(1,0,1)
> ok
[1] 1 0 1
> (sol <- MIPModel() %>%
+     add_variable(x[i], i=1:n, ok[i]==1)
+ )
Mixed integer linear optimization problem
Variables:
  Continuous: 2
  Integer: 0
  Binary: 0
No objective function.
Constraints: 0


This looks great. We see x[2] is not included in the variables. Now let's do the same thing for a two dimensional variable x[i,j]:

> n <- 3
> ok <- matrix(c(0,1,1, 0,0,1, 0,0,0),nrow=3,ncol=3,byrow=T)
> ok
     [,1] [,2] [,3]
[1,]    0    1    1
[2,]    0    0    1
[3,]    0    0    0
> (sol <- MIPModel() %>%
+     add_variable(x[i,j], i=1:n, j=1:n, ok[i,j]==1) 
+ )
Error in validate_quantifier_candidates(candidates, zero_vars_msg) : 
  only_integer_candidates are not all TRUE

Something goes wrong here. The error message is not very informative: I have no clue what it means. I also have no idea how to fix this.

References


  1. Many Solvers, One Interface, ROI, the R Optimization Infrastructure Package, https://www.r-project.org/conferences/useR-2010/slides/Theussl+Hornik+Meyer.pdf
  2. Ronald Hochreiter and Florian Schwendinger, ROI Plug-in NEOS, https://cran.r-project.org/web/packages/ROI.plugin.neos/vignettes/ROI.plugin.neos_Introduction.pdf
  3. NEOS Server: State-of-the-Art Solvers for Numerical Optimization, https://neos-server.org/neos/
  4. Matlab vs GAMS: linear programming, http://yetanothermathprogrammingconsultant.blogspot.com/2016/09/matlab-vs-gams-linear-programming.html
  5. Mixed Integer Linear Programming in R, https://dirkschumacher.github.io/ompr/
  6. MIP model in R using the OMPR package, http://yetanothermathprogrammingconsultant.blogspot.com/2017/04/mip-model-in-r-using-ompr-package.html
  7. Robert Fourer, Modeling Languages versus Matrix Generators for Linear Programming, ACM Transactions on Mathematical Software, vol. 9, pp. 143–183, 1983.

Sunday, June 17, 2018

select weights to maximize count

In [1] a problem that is simple at first sight. Looking a bit further there are some interesting angles.

The example data set is:


----     27 PARAMETER a  

            j1          j2          j3

i1       0.870       0.730       0.410
i2       0.820       0.730       0.850
i3       0.820       0.370       0.850
i4       0.580       0.950       0.420
i5       1.000       1.000       0.900

The idea is that we can apply weights \(w_j\) to calculate a final score for each row:\[F_i = \sum_j w_j a_{i,j}\] Weights obey the usual constraints: \(w_j \in [0,1]\) and \(\sum_j w_j=1\). The goal is to find optimal weights such that the number of records with final score in the bucket \([0.9, 1]\) is maximized.

Looking at the data, \(w=(0,1,0)\) is a good choice. This gives us two \(F_i\)'s in our bucket,  Let's see if we can formalize this with a model.

MIP Model


Counting is done with binary variables. So let's define\[\delta_i = \begin{cases} 1 & \text{if $L\le F_i\le U$}\\ 0 & \text{otherwise}\end{cases}\] I used \(L\) and \(U\) to indicate the bucket.

A first model can look like:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_i \delta_i \\  & F_i = \sum_j w_j a_{i,j}\\& \sum_i w_i = 1\\ & L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\\ & \delta_i \in \{0,1\} \\ & w_i \in [0,1]\end{align}}\] Here \(M\) is a large enough constant.

The sandwich equation models the implication \[\delta_i=1 \Rightarrow L\le F_i \le U\] The objective will make sure that \(L\le F_i \le U \Rightarrow \delta_i=1 \) holds for the optimal solution.

This is the big picture. This model works: it finds an optimal set of weights so that as many as possible \(F_i \in [L,U]\). In the rest of this post I dive into to some nitty-gritty details. No elegant math, just largely boring stuff. However, this is typically what we need to do when working on non-trivial MIP models.

Big-M


The data seems to suggest \(0 \le a_{i,j} \le 1\). Which means \(0 \le F_i \le 1\). This follows from \[\min_j a_{i,j} \le F_i \le \min_j a_{i,j}\] We can also assume \(0 \le L \le U \le 1\). We can conclude the largest difference possible between \(F_i\) and \(L\) (and \(F_i\) and \(U\)) is one. So, in this case an obvious value for \(M\) is \(M=1\). More generally, if we first make sure \(U \le \max\{a_{i,j}\}\) and \(L \ge \min\{a_{i,j}\}\) by the preprocessing step: \[\begin{align} & U := \min\{U, \max\{a_{i,j}\}\}\\ & L := \max\{L, \min\{a_{i,j}\}\}\end{align}\] we have: \[M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\] We can do even better by using an \(M_i\) for each record instead of a single, global \(M\).  We will get back to this later.

More preprocessing


We can optimize things further by observing that not always both "branches" of \[L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\] are needed. With our small example we have \(U=1\), but we know already that \(F_i \le 1\). So in this case we only need to worry about \(L - M(1-\delta_i) \le F_i\).

We can generalize this as follows. First calculate bounds \(\ell_i \le F_i\le u_i\): \[\begin{align} & \ell_i = \min_j a_{i,j}\\ & u_i = \max_j a_{i,j}\end{align}\] Then generate constraints: \[\begin{align} & L - M(1-\delta_i) \le F_i && \forall i | L > \ell_i\\ & F_i \le U+M(1-\delta_i) && \forall i | U < u_i\end{align}\]

Even more preprocessing


The first three records in the example data set are really no candidates for having \(F_i \in [0.9, 1]\). The reason is that for those records we have \(u_i \lt L\). In general we can skip all records with \(u_i \lt L\) or \(\ell_i \gt U\). These records will never have \(\delta_i=1\).

Combining things


I extended the \(a\) matrix with following columns:

  • lo: \(\ell_i=\min_j a_{i,j}\).
  • up: \(u_i = \max_j a_{i,j}\).
  • cand: boolean, indicates if this row is a candidate. We check: \(u_i \ge L\) and \(\ell_i \le U\).
  • chkL: boolean, indicates if we need to check the left "branch". This is equal to one if \(\ell_i\lt L\).
  • chkR: boolean, indicates if we need to check the right "branch". This is equal to one if \(u_i\gt L\). No records have this equal to 1. 
The matrix now looks like:


----     41 PARAMETER a  

            j1          j2          j3          lo          up        cand        chkL

i1       0.870       0.730       0.410       0.410       0.870                   1.000
i2       0.820       0.730       0.850       0.730       0.850                   1.000
i3       0.820       0.370       0.850       0.370       0.850                   1.000
i4       0.580       0.950       0.420       0.420       0.950       1.000       1.000
i5       1.000       1.000       0.900       0.900       1.000       1.000

The last record is interesting. It has \(\mathit{chkL}=\mathit{chkR}=0\). This is correct: it will always allow \(\delta_{i5}=1\) no matter what weights we choose. Note that the constraints \(L - M(1-\delta_i) \le F_i\) are only generated in case both \(\mathit{cand}=\mathit{chkL}=1\).

For a small data set this all does not make much difference, but for large ones, we make the model much smaller.

Results


The optimal weights for this small data set are not unique. Obviously I get the same optimal number of selected records:


----     71 VARIABLE w.L  weights

j1 0.135,    j2 0.865


----     71 VARIABLE f.L  final scores

i1 0.749,    i2 0.742,    i3 0.431,    i4 0.900,    i5 1.000


----     71 VARIABLE delta.L  selected

i4 1.000,    i5 1.000


----     71 VARIABLE z.L                   =        2.000  objective

Big-M revisited


I did not pay too much attention to my big-M's. I just used the calculation \(M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\), which yielded:


----     46 PARAMETER M                    =        0.630  big-M

We can use a tailored \(M^L_i, M^U_i\) for each inequality. For the lower bounds our equation looks like \[L - M^L_i(1-\delta_i) \le F_i\] This means we have \(M^L_i \ge L - F_i\). We have bounds on \(F_i\), (we stored these in \(a_{i,up}\) and \(a_{i,lo}\)).  So we can set \(M^L_i =L - a_{i,lo}\). This gives:


----     78 PARAMETER ML  

i4 0.480

Similar for the U inequality. For our small example this is not so important, but for larger instances with a wider range of data this may be essential.

Larger problem


For a larger random problem:


----     43 PARAMETER a  data

             j1          j2          j3          lo          up        cand        chkL        chkU

i1        0.737       0.866       0.893       0.737       0.893                   1.000
i2        0.434       0.654       0.606       0.434       0.654                   1.000
i3        0.721       0.987       0.648       0.648       0.987       1.000       1.000
i4        0.800       0.618       0.869       0.618       0.869                   1.000
i5        0.668       0.703       1.177       0.668       1.177       1.000       1.000       1.000
i6        0.656       0.540       0.525       0.525       0.656                   1.000
i7        0.864       1.037       0.569       0.569       1.037       1.000       1.000       1.000
i8        0.942       1.003       0.655       0.655       1.003       1.000       1.000       1.000
i9        0.600       0.801       0.601       0.600       0.801                   1.000
i10       0.619       0.933       1.114       0.619       1.114       1.000       1.000       1.000
i11       1.243       0.675       0.756       0.675       1.243       1.000       1.000       1.000
i12       0.608       0.778       0.747       0.608       0.778                   1.000
i13       0.695       0.593       1.196       0.593       1.196       1.000       1.000       1.000
i14       0.965       1.001       0.650       0.650       1.001       1.000       1.000       1.000
i15       0.951       1.040       0.969       0.951       1.040       1.000                   1.000
i16       0.514       1.220       0.935       0.514       1.220       1.000       1.000       1.000
i17       0.834       1.130       1.054       0.834       1.130       1.000       1.000       1.000
i18       1.161       0.550       0.481       0.481       1.161       1.000       1.000       1.000
i19       0.638       0.640       0.691       0.638       0.691                   1.000
i20       1.057       0.724       0.657       0.657       1.057       1.000       1.000       1.000
i21       0.814       0.775       0.448       0.448       0.814                   1.000
i22       0.800       0.737       0.741       0.737       0.800                   1.000
i23       0.573       0.555       0.789       0.555       0.789                   1.000
i24       0.829       0.679       1.059       0.679       1.059       1.000       1.000       1.000
i25       0.761       0.956       0.687       0.687       0.956       1.000       1.000
i26       0.870       0.673       0.822       0.673       0.870                   1.000
i27       0.304       0.900       0.920       0.304       0.920       1.000       1.000
i28       0.544       0.659       0.495       0.495       0.659                   1.000
i29       0.755       0.589       0.520       0.520       0.755                   1.000
i30       0.492       0.617       0.681       0.492       0.681                   1.000
i31       0.657       0.767       0.634       0.634       0.767                   1.000
i32       0.752       0.684       0.596       0.596       0.752                   1.000
i33       0.616       0.846       0.803       0.616       0.846                   1.000
i34       0.677       1.098       1.132       0.677       1.132       1.000       1.000       1.000
i35       0.454       1.037       1.204       0.454       1.204       1.000       1.000       1.000
i36       0.815       0.605       0.890       0.605       0.890                   1.000
i37       0.814       0.587       0.939       0.587       0.939       1.000       1.000
i38       1.019       0.675       0.613       0.613       1.019       1.000       1.000       1.000
i39       0.547       0.946       0.843       0.547       0.946       1.000       1.000
i40       0.724       0.571       0.757       0.571       0.757                   1.000
i41       0.611       0.916       0.891       0.611       0.916       1.000       1.000
i42       0.680       0.624       1.111       0.624       1.111       1.000       1.000       1.000
i43       1.015       0.870       0.823       0.823       1.015       1.000       1.000       1.000
i44       0.587       0.866       0.691       0.587       0.866                   1.000
i45       0.789       1.090       0.649       0.649       1.090       1.000       1.000       1.000
i46       1.436       0.747       0.805       0.747       1.436       1.000       1.000       1.000
i47       0.791       0.885       0.723       0.723       0.885                   1.000
i48       0.718       1.028       0.869       0.718       1.028       1.000       1.000       1.000
i49       0.876       0.772       0.918       0.772       0.918       1.000       1.000
i50       0.498       0.882       0.599       0.498       0.882                   1.000

we see that we can remove a significant number of records and big-M constraints. Again we want to maximize the number of  final scores \(F_i \in [0.9, 1]\). Note that now we have some records with \(a_{i,j}\gt 1\). So the column \(\mathit{chkU}\) gets properly populated.

The model solves instantaneous and shows:


----     74 VARIABLE w.L  weights

j1 0.007,    j2 0.792,    j3 0.202


----     74 VARIABLE f.L  final scores

i1  0.870,    i2  0.643,    i3  0.917,    i4  0.670,    i5  0.798,    i6  0.538,    i7  0.942,    i8  0.933
i9  0.760,    i10 0.967,    i11 0.695,    i12 0.771,    i13 0.715,    i14 0.930,    i15 1.025,    i16 1.158
i17 1.112,    i18 0.541,    i19 0.650,    i20 0.713,    i21 0.710,    i22 0.738,    i23 0.602,    i24 0.757
i25 0.900,    i26 0.705,    i27 0.900,    i28 0.625,    i29 0.576,    i30 0.629,    i31 0.739,    i32 0.667
i33 0.836,    i34 1.102,    i35 1.067,    i36 0.664,    i37 0.660,    i38 0.665,    i39 0.923,    i40 0.609
i41 0.909,    i42 0.722,    i43 0.861,    i44 0.829,    i45 0.999,    i46 0.764,    i47 0.851,    i48 0.994
i49 0.802,    i50 0.822


----     74 VARIABLE delta.L  selected

i3  1.000,    i7  1.000,    i8  1.000,    i10 1.000,    i14 1.000,    i15 1.000,    i16 1.000,    i17 1.000
i25 1.000,    i27 1.000,    i34 1.000,    i35 1.000,    i39 1.000,    i41 1.000,    i45 1.000,    i48 1.000


----     74 VARIABLE z.L                   =       16.000  objective

Conclusion


A simple model becomes not that simple once we start "optimizing" it. Unfortunately this is typical for large MIP models.

References


  1. Simulation/Optimization Package in R for tuning weights to achieve maximum allocation for groups, https://stackoverflow.com/questions/50843023/simulation-optimization-package-in-r-for-tuning-weights-to-achieve-maximum-alloc

Saturday, June 9, 2018

Sparsest solution: a difficult MIP

The problem of finding a solution vector that is as sparse as possible can be formulated as a MIP.

I was looking at solving an underdetermined  problem \[Ax=b\] i.e. the number of rows \(m\) is less than the number of colums \(n\). A solution \(x\) has in general \(m\) nonzero elements. 

The actual system was: \[Ax\approx b\] or to be more precise \[-\epsilon \le Ax-b\le \epsilon \] By adding slack variables \(s_i\) we can write this as: \[\begin{align}&Ax=b+s\\&-\epsilon\le s \le \epsilon\end{align} \] Now we have more wiggle room, and can try to find the vector \(x\) that has as many zero elements as possible.

Big-M formulation


The MIP model seems obvious: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \min & \sum_j \delta_j \\ & Ax=b+s \\ & -M\delta_j \le x_j \le M\delta_j \\ & s_i \in [-\epsilon,\epsilon]\\ & \delta_j \in \{0,1\}\end{align}}\] This turns out to be a very problematic formulation. First we have no good a priori value for \(M\). I used \(M=10,000\) but that gives some real issues. Furthermore the performance is not good. For a very small problem with \(m=20, n=40\) we already see a solution time of about 20 minutes. With just 40 binary variables I expected something like a minute or two. But that is only a minor part of the problem. The results with Cplex look like:


----     82 VARIABLE x.L  

j3  -0.026,    j4   0.576,    j7   0.638,    j11  0.040,    j12 -0.747,    j14  0.039,    j15 -0.169,    j19 -0.088
j23  0.509,    j31 -0.475,    j34 -0.750,    j35 -0.509


----     82 VARIABLE delta.L  

j3  2.602265E-6,    j4        1.000,    j7        1.000,    j11 4.012925E-6,    j12       1.000,    j14 3.921840E-6
j15       1.000,    j19 8.834778E-6,    j23       1.000,    j31       1.000,    j34       1.000,    j35       1.000


----     82 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    big M        10000.000
iterations 5.443661E+7,    nodes      5990913.000,    seconds       1004.422,    nz(x)           12.000
obj              8.000,    gap                EPS

Well, we have some problems here. The optimal objective is reported as 8, so we have 8 \(\delta_j\)'s that are equal to one. But at the same time the number of nonzero values in \(x\) is 12.  This does not match. The reason is: we have a few \(\delta_j\)'s that are very small (approx. \(10^{-6}\)). Cplex considers these small values as being zero, while the model sees opportunities to exploit these small values for what is sometimes called "trickle flow". These results are just not very reliable. We cannot just ignore the small \(\delta_j\)'s and conclude that the optimal number of non-zero elements in \(x\) is 8 (we will see later it is 11). The underlying reason is a relatively large value for \(M\) combined with a Cplex integer feasibility tolerance that is large enough to create leaks.

There a few things we can do to repair this:

  • Reduce the value of \(M\), e.g. reduce it from \(M=10,000\) to \(M=1,000\). 
  • Tighten the integer feasibility tolerance. Cplex has a default tolerance epint=1.0e-5 (it allows it to be set to zero) .
  • Use SOS1 sets instead of big-M's. Special Ordered Sets of Type 1 allow us to model the problem in a slightly different way than with binary variables. 
  • Use indicator constraints instead of big-M's. Indicator constraints can implement the implication \(\delta_j=0 \Rightarrow x_j=0\) directly. 

SOS1 formulation


We can get rid of the big-M problem by using SOS1 variables: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_j y_j \\ & Ax=b+s \\ & (x_j,y_j) \in \text{SOS1} \\ & s_i \in [-\epsilon,\epsilon]\\ & y_j \in [0,1]\end{align}}\] The SOS1 sets say: \(x_j = 0\) or \(y_j=0\) (or both), or in different words: at most one of \(x_j,y_j\) can be non-zero. The objective tries to make as many elements of \(y\) equal to one. The corresponding elements of \(x\) will be zero, making the \(x\) vector sparser. This SOS1 approach is more reliable but can be slower. When we try this on our small problem, we see:


----    119 VARIABLE x.L  

j4   0.601,    j7   0.650,    j11  0.072,    j12 -0.780,    j15 -0.170,    j19 -0.117,    j23  0.515,    j31 -0.483
j34 -0.750,    j35 -0.528,    j36 -0.031


----    119 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    iterations 4.787142E+7
nodes      1.229474E+7,    seconds       3600.156,    nz(x)           11.000,    obj             29.000
gap              0.138

This model could not be solved to optimality within one hour! We stopped on a time limit of 3600 seconds. Remember, we only have 40 discrete structures in this model. The gap is still 13.8% after running for an hour. Otherwise, the results make sense: the objective of 29 corresponds to 11 nonzero values in \(x\).

Indicator constraint formulation


Finally, a formulation using indicator constraints can look like: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \min & \sum_j \delta_j \\ & Ax=b+s \\ & \delta_j = 0 \Rightarrow x_j=0 \\ & s_i \in [-\epsilon,\epsilon]\\ & \delta_j \in \{0,1\}\end{align}}\] This formulation does not use a big-M coefficient either, and the results are consistent.

Performance-wise, this does not help much:


----    178 VARIABLE x.L  

j1   0.158,    j2  -0.459,    j13 -0.155,    j14  0.470,    j17 -0.490,    j22  0.269,    j23  0.211,    j32  1.164
j33  0.147,    j38 -0.604,    j39 -0.563


----    178 VARIABLE delta.L  

j1  1.000,    j2  1.000,    j13 1.000,    j14 1.000,    j17 1.000,    j22 1.000,    j23 1.000,    j32 1.000
j33 1.000,    j38 1.000,    j39 1.000


----    178 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    iterations 7.127289E+7
nodes      1.523042E+7,    seconds       3600.203,    nz(x)           11.000,    obj             11.000
gap              0.273

We are hitting our time limit. The gap is 27.3%.

Optimal solution


The optimal solution is indeed 11 nonzero elements in \(x\). It took me 4 hours to prove this. Here are the MIP bounds:


The optimal solution was found quite early but proving optimality took a long time. The final jump in the best bound (red line : bound on best possible integer solution) can be explained as follows. The objective (blue line) can only assume integer variables, so it jumps by one. You can see this happening early on when jumping from 12 to 11. This also means we are optimal as soon as the best bound (red line) reaches a value larger than 10. At that point we know 11 is the optimal objective value.

NB. To be complete: this run used a big-M formulation with a reduced value of \(M=1,000\) and an integer feasibility tolerance (option epint) of zero. Note that such a value is really providing us with a "paranoia mode" and may result in somewhat longer solution times. Not all solvers allow a integer feasibility tolerance of zero. In addition, I used the option mipemphasis=2 to tell Cplex we want optimality.

Conclusion


Here is a very small MIP model with only 40 binary variables that is just very difficult to solve to optimality. Although we can find the optimal solution relatively quickly, proving optimality proofs to be very challenging. This is not the usual case: for most models with 40 binary variables we can expect very quick turnaround. This model is unusually difficult.

In addition, this models illustrates nicely the dangers of big-M formulations. Even for modest values, like \(M=10,000\), we can be in real trouble.

This is good example that should give us some pause. 


Friday, June 1, 2018

The real optimization problem

Dear Erwin,

Good Morning!!.

I am doing my Master's Degree in Computer Science.

Currently, I am doing some research on Facility Location Problem Optimization. I came across your answers on stackoverflow and your blog as well. Your findings are really helpful, I appreciate you for this. Really well structured and clearly understandable. Mathematical models are well explained.

I kindly request you, If you provide small implementation code in JAVA/Python it will be very helpful. So that I can understand the concept deeply.

Thanks for your time and consideration.

Kindest Regards,

This question is about [1]. I did not use Java or Python to solve the models discussed there. So my answer was easy: sorry.

However, I am a bit uncomfortable with questions like this. Is "research" really not more than just emailing and copy/paste from replies? Should a CS student doing his Master's thesis not be able to translate a detailed (but simple) mathematical model to working code? Would you not learn much more by actually implementing the model yourself?

Often, I find the process to reproduce results in a paper very good way to get a good understanding of the issues. I prefer a mathematical model, or pseudo code if an algorithm needs to be implemented. Just copy/paste is hardly ever a good idea.

I have the feeling the real underlying optimization problem is: minimize effort. Or to make it sound better: maximize efficiency.

I am probably a bit too harsh.

References


  1. Solving a facility location problem as an MIQCP, http://yetanothermathprogrammingconsultant.blogspot.com/2018/01/solving-facility-location-problem-as.html