Wednesday, August 21, 2019

R/Python + C++

In some recent projects, I was working on using algorithms implemented in C++ from R and Python. Basically the idea is: Python and R are great languages for scripting but they are slow as molasses. So, it may make sense to develop the time consuming algorithms in C++ while driving the algorithm from R or Python.

R and C++: Rcpp.


The standard way to build interfaces between R and C++ code is to use Rcpp [1,2].




It is possible to directly interface R with low-level C code, This will require a lot of code and knowledge of R internals. Rcpp automates a lot of this. E.g. Rcpp will take care of translating an R vector into a C++ vector.

Rcpp supports both small fragments of C++ code passed on as an R string to more coarse-grained file based approach [3]. For Windows, you need to download the GNU compilers [4].

If you are new to both  Rcpp and to building your own R packages [5] things may be a bit overwhelming.

Rstudio can help a lot. It supports a lot of very useful tasks:

  • Syntax coloring for C++ code. 
  • Building projects.
  • Git version control.
  • Documentation tools (rmarkdown and bookdown). My documentation is also a good test: it executes almost all of the code when building the document.

Editing C++ code in RStudio


Basically I never have to leave RStudio.

I have added an alternative driver file for my C++ code so I can debug it in Visual Studio. I used it only a few times: most of the time I just used RStudio.


Python and C++: pybind11


pybind11 [6] is in many respects similar to Rcpp, although it requires a little bit more programming to bridge the gap between Python and C++.



In the beginning of the above Youtube video [7], the presenter compares pybind11 with some of the alternatives:

  • SWIG: author of SWIG says: don't use it
  • ctypes: call c functions but not c++
  • CFFI: call c functions
  • Boost.Python: support for older C++ standards, but not much maintained
  • pybind11: modern 

As with Rcpp, calling the compiler is done through running a build or setup script. For Rcpp I used the GNU compilers, while pybind11/pip install supports the Visual Studio C++ compiler. This also means that if you have little experience with both pybind11 and creating packages, the learning curve may be steep.


References


  1. http://www.rcpp.org
  2. Dirk Eddelbuettel, Seamless R and C++ Integration with Rcpp, Springer, 2013
  3. Chapter: Rewriting R code in C++https://adv-r.hadley.nz/rcpp.html 
  4. Hadley Wickham, R packages, O'Reilly, 2015 
  5. https://cran.r-project.org/bin/windows/Rtools/
  6. https://pybind11.readthedocs.io/en/master/
  7. Robert Smallshire, Integrate Python and C++ with pybind11, https://www.youtube.com/watch?v=YReJ3pSnNDo

Thursday, August 15, 2019

Scheduling teachers

From [1]:

I am trying to find a solution to the following problem: 
  • There are 6 teachers
  • Each teacher can only work for 8 hours a day
  • Each teacher must have a 30 minute break ideally near the middle of his shift
  • The following table shows the number of teacher needed in the room at certain times:
  • 7am - 8am 1 teacher
  • 8am - 9am 2 teacher
  • 10am - 11am 5 teacher
  • 11am - 5pm 6 teacher
  • 5pm - 6pm 2 teacher 
What is a good way to solve this (ideally in python and Google OR-Tools) ? 
Thank you

Initial analysis

From the demand data we see that we need all 6 teachers from 11am-5pm working without lunch break. This is 6 hours. So there is no way to allow a lunch break near the middle of the shift.


We have more problems. The picture of the demand data indicates we have no demand between 9am and 10am. That does not look right. Bad data is fact of life. Optimization models are especially sensitive to bad data.

I believe that looking critically at your data is essential for successful optimization applications. You can learn a lot from just a bit of staring.

Alternative problem 1

We can ask a few different questions. If we only allow shifts of the form: 4 hours work, 0.5 hour lunch, 4 hours work (i.e. lunch perfectly in the middle of the shift), how many teachers do we need to meet the demand. To model this, we can assume time periods of half an hour. The number of possible shifts is small:


With an enumeration of the shifts, we can model this as a covering problem:

Covering Model
\[\begin{align} \min\> &\color{darkblue}z= \sum_s \color{darkred} x_s \\ & \sum_{s|\color{darkblue}{\mathit cover}(s,t)} \color{darkred} x_s \ge \color{darkblue}{\mathit demand}_{t} && \forall t \\ &\color{darkred} x_i \in \{0,1,2,\dots\}\end{align} \]

Here \(\mathit{cover}(s,t)=\text{True}\) if shift \(s\) covers time period \(t\). If we would interpret \(\mathit cover(s,t)\) as a binary (data) matrix \[\mathit cover_{s,t} = \begin{cases} 1 & \text{if shift $s$ covers time period $t$}\\ 0 & \text{otherwise}\end{cases}\] we can also write:

Covering Model (alternative interpretation)
\[\begin{align} \min\> &\color{darkblue}z= \sum_s \color{darkred} x_s \\ & \sum_s \color{darkblue}{\mathit cover}_{s,t} \cdot \color{darkred} x_s \ge \color{darkblue}{\mathit demand}_{t} && \forall t \\ &\color{darkred} x_i \in \{0,1,2,\dots\}\end{align} \]

Our new assumptions are:

  • we only allow shifts with a lunch break in the middle
  • we add to our demand data: 4 teachers needed between 9am and 10am
After solving this easy MIP model we see:

----     55 VARIABLE x.L  number of shifts needed

shift1 2.000,    shift4 2.000,    shift5 2.000,    shift6 2.000


----     55 VARIABLE z.L                   =        8.000  total number of shifts

I.e. we need 8 teachers to handle this workload.


The picture shows we are overshooting demand in quite a few periods. Note: this picture illustrates the left-hand side (orange bars) and right-hand side (blue line) of the demand equation in the Covering Model.

Alternative problem 2

Lets make things a bit more complicated. We allow now the following rules for a single shift:


  • The work period before the lunch bread is between 3 and 5 hours (or between 6 and 10 time periods)
  • There is a lunch break of 0.5 or 1 hour.
  • After lunch there is another work period of between 3 and 5 hours. The total number of working hours is 8.
When we enumerate the shifts, we see:



We now have 55 different shifts to consider.

The results look like:


----     72 VARIABLE x.L  number of shifts needed

shift1  1.000,    shift22 1.000,    shift26 1.000,    shift39 1.000,    shift49 1.000,    shift52 1.000
shift55 1.000


----     72 VARIABLE z.L                   =        7.000  total number of shifts


We see the number of teachers needed is 7. We are closer to the demand curve:



We see that if we add more flexibility we can do a bit better. Achieving 6 teachers is almost impossible. We would need to introduce shifts like: work 2 hours, lunch, work 6 hours. The teachers union would object.

It is noted there are quite a few other methods to solve models like this where we don't need to enumerate all possible shifts [2,3]. For larger problems it may not be feasible to employ the shift enumeration scheme we used here.

References


Wednesday, August 7, 2019

Finding the central point in a point cloud


Problem

The problem is easy to describe: given the coordinates of points \(p_i\), find the central point \(x\) that minimizes the sum of the distances from \(x\) to \(p_i\) [1]. This point is sometimes called the geometric median [2].

Data

We just use some random data on the unit square \([0,1]\times[0,1]\):


----     13 PARAMETER p  points (randomly generated)

                  x           y

point1   0.17174713  0.84326671
point2   0.55037536  0.30113790
point3   0.29221212  0.22405287
point4   0.34983050  0.85627035
point5   0.06711372  0.50021067
point6   0.99811763  0.57873338
point7   0.99113304  0.76225047
point8   0.13069248  0.63971876
point9   0.15951786  0.25008053
point10  0.66892861  0.43535638
point11  0.35970027  0.35144137
point12  0.13149159  0.15010179
point13  0.58911365  0.83089281
point14  0.23081574  0.66573446
point15  0.77585761  0.30365848
point16  0.11049229  0.50238487
point17  0.16017276  0.87246231
point18  0.26511455  0.28581432
point19  0.59395592  0.72271907
point20  0.62824868  0.46379787
point21  0.41330699  0.11769536
point22  0.31421227  0.04655151
point23  0.33855027  0.18209959
point24  0.64572713  0.56074555
point25  0.76996172  0.29780586
point26  0.66110626  0.75582167
point27  0.62744750  0.28386420
point28  0.08642462  0.10251467
point29  0.64125115  0.54530950
point30  0.03152485  0.79236064
point31  0.07276700  0.17566105
point32  0.52563261  0.75020767
point33  0.17812371  0.03414099
point34  0.58513117  0.62122998
point35  0.38936190  0.35871415
point36  0.24303462  0.24642154
point37  0.13050280  0.93344972
point38  0.37993791  0.78340046
point39  0.30003426  0.12548322
point40  0.74887411  0.06923246
point41  0.20201556  0.00506586
point42  0.26961305  0.49985148
point43  0.15128587  0.17416945
point44  0.33063773  0.31690605
point45  0.32208696  0.96397664
point46  0.99360221  0.36990306
point47  0.37288857  0.77197833
point48  0.39668414  0.91309632
point49  0.11957773  0.73547889
point50  0.05541847  0.57629980
mean     0.39042709  0.47299042

The last entry labeled mean is just the average of the \(x\)- and \(y\)-coordinates.

NLP Model

A straightforward non-linear programming model can look like:

Unconstrained NLP Model
\[\min \sum_i \sqrt{\sum_c (\color{darkred}x_c-\color{darkblue}p_{i,c})^2 }\]


We use \(c = \{x,y\}\), i.e. we have \(x\) and \(y\)-coordinates. Note that we use \(x\) in two different contexts: element of set \(c\), being the \(x\)-coordinate, and the decision variable \(x_c\).

We can use the mean as a very good starting point to help the NLP solver. I.e. \[x_c := \frac{\displaystyle\sum_{i=1}^n p_{i,c}}{n}\]

The picture below shows why the mean is such a good starting point:

Optimal center point is close to mean point

The numeric values are here:

----     45 PARAMETER results  x(center) vs mean

               x           y     sumdist

mean  0.39042709  0.47299042 18.02414861
x     0.37048298  0.43857572 17.96891985

The sumdist column shows the objective values for these two points.

This is an easy NLP problem. Most NLP solvers just need a few iterations. With a system like GAMS or AMPL we get exact gradients automatically. That is much preferable to finite differences which seems the prevalent method people use in an R or Python environment.

Cone programming I

The above problem can also be written as a cone programming problem. This will allow us to use a different class of solvers to work on this problem. Here we use CVXPY [3] to express the model. The Python code can look like:

import cvxpy as cp  

x = cp.Variable(2)   # center point

obj = cp.Minimize( cp.sum( [ cp.norm(x-p[i,:]) for i in range(N) ] ) )
prob = cp.Problem(obj)

objval = prob.solve(solver=cp.SCS, verbose=True)

This is very much a straight translation of our unconstrained NLP model. Although we only declared two \(x\) variables, behind the scenes the model is blown up to a rather large one. We can see from the log:

----------------------------------------------------------------------------
    SCS v2.1.1 - Splitting Conic Solver
    (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 150
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 0, rho_x = 1.00e-03
Variables n = 52, constraints m = 150
Cones:  soc vars: 150, soc blks: 50

The generated SOCP (second order cone programming) model is larger, but also very sparse. The solver has no problem solving it very quickly. With SOCP solvers we usually don't worry about an initial point like we used in the NLP model.

Cone programming II

If we don't use a modeling tool that can do these transformations automatically, we can use a DIY approach. Second-order cone constraints can be stated as: \[||A_i x+b_i||_2 \le c_i^Tx + d_i \>\>\forall i\] This would imply that we can write our model as:

SOCP Model Attempt
\[\begin{align} \min & \sum_i \color{darkred} d_i \\ & \color{darkred} d_i^2 \ge \sum_c (\color{darkred}x_c-\color{darkblue}p_{i,c})^2 && \forall i \\ &\color{darkred} d_i \ge 0, \color{darkred} x_c \text{ free}\end{align} \]

Unfortunately this will yield a message like: CPLEX Error  5002: 'e(point1)' is not convex or Constraint 'e(point1)'(0) is not convex. Q should be positive semidefinite in a constraint with finite upper bound.  We can repair this as follows:

Repaired SOCP Model
\[\begin{align} \min & \sum_i \color{darkred} d_i \\ & \color{darkred} d_i^2 \ge \sum_c \color{darkred}y_{i,c}^2 && \forall i \\ & \color{darkred} y_{i,c} = \color{darkred} x_c -\color{darkblue}p_{i,c} && \forall i,c \\ &\color{darkred} d_i \ge 0, \color{darkred} x_c \text{ free}, \color{darkred} y_{i,c} \text{ free}\end{align} \]

This now solves quickly. We can now understand that CVXPY did quite a few steps before passing the model to the solver. As argued in [4], it is much better if the modeling system takes care of these reformulations. Some of them are not immediate obvious, and hand-crafted reformulations can be error-prone.

The NLP model has just 2 variables \(x_c\). This SOCP model has 152 variables. Using \(n=50\) data points, we added \(3n\) variables. When looking at the number of constraints we see a similar thing. The NLP model has no constraints, but this SOCP model has \(3n=150\) constraints (\(n\) of them cones).

Conclusion

The min sum distance problem has a simple NLP formulation which can be improved by using a good initial point. It can also be formulated as a SOCP problem. Using high-level modeling tools this is not difficult. Without automatic reformulations things become a bit less obvious.


References


  1. The point that minimizes the sum of euclidean distances to a set of n points, https://stackoverflow.com/questions/57277247/the-point-that-minimizes-the-sum-of-euclidean-distances-to-a-set-of-n-points
  2. Geometric median, https://en.wikipedia.org/wiki/Geometric_median
  3. https://www.cvxpy.org/
  4. Victor Zverovich, Robert Fourer, Automatic Reformulation of Second-Order Cone Programming Problems, https://ampl.com/MEETINGS/TALKS/2015_01_Richmond_2E.2.pdf

Tuesday, July 30, 2019

Advanced R (second edition)


The second edition is out. My version has color pictures (so it must be good!). This book is about the R programming language (not about statistical techniques). Even if you know R really well, you will learn from this book. Hadley Wickham is the author of many extremely useful R packages including ggplot.

Contents


  1. Introduction

    I Foundation
  2. Names and values
  3. Vectors
  4. Subsetting
  5. Control flow
  6. Functions
  7. Environments
  8. Conditions

    II Functional Programming
  9. Functionals
  10. Function factories
  11. Function operators

    III Object-oriented Programming
  12. Base types
  13. S3
  14. R6
  15. S4
  16. Trade-offs

    IV Metaprogramming
  17. Big picture
  18. Expressions
  19. Quasiquotation
  20. Evaluation
  21. Translating R code

    V Techniques
  22. Debugging
  23. Measuring performance
  24. Improving performance
  25. Rewriting R code in C++

References


  1. https://adv-r.hadley.nz/  Online version of book
  2. https://github.com/hadley/adv-r The source code for the book. The book is written in bookdown.
  3. https://bookdown.org/

Tuesday, July 16, 2019

Quadratic Indicator Constraint?

I was pondering about a constraint of the form:\[x_{i,k}=1 \Rightarrow d_{i,k} \ge \sum_j \left( \bar{x}_{k,j} - p_{i,j} \right)^2\] where \(x_{i,k}\in \{0,1\}\). Obviously, this could be implemented easily using indicator constraints, which are supported by all major solvers. These constraints have the form: \[\begin{align} &x_i = 0 \Rightarrow \text{constraint} \\ &\text{or} \\ & x_i = 1 \Rightarrow \text{constraint}\end{align}\] where \(x_i \in \{0,1\}\).

Unfortunately, indicator constraints are only supported for linear constraints. The Cplex docs say:

  • The constraint must be linear; a quadratic constraint is not allowed to have an indicator constraint.
  • A lazy constraint cannot have an indicator constraint.
  • A user-defined cut cannot have an indicator constraint.

Gurobi mentions:

An indicator constraint \(y=f  \rightarrow a^Tx \le b\) states that if the binary variable \(y\) has the value \(f\in \{0,1\}\) in a given solution then the linear constraint \(a^Tb \le b\) has to be satisfied.

Xpress and SCIP say similar things: indicator constraints are only for linear constraints.

Somehow, I expected that I could use a (convex) quadratic constraint with an indicator. As none of the solvers has this implemented, I suspect there is a good reason why this is not possible.

Big-M and SOS1


So, now we cannot use indicators, we need to use more traditional implementation techniques. The most obvious is to use a big-\(M\) formulation: \[  d_{i,k} \ge \sum_j \left( \bar{x}_{k,j} - p_{i,j} \right)^2 - M(1-x_{i,k})\] The disadvantage of this formulation is that we need to establish a good value for \(M\) and that this value cannot be too large. Large values of \(M\) tend to confuse MIP solvers.

If you have no good (and small) values for \(M\), we can use SOS1 sets. I.e. we write \[\begin{align} &d_{i,k} \ge \sum_j \left( \bar{x}_{k,j} - p_{i,j} \right)^2 - s_{i,k} \\ & s_{i,k}\ge 0\\ & \{s_{i,k},x_{i,k}\} \in SOS1\end{align}\] This says: either \(s_{i,k}\) or \(x_{i,k}\) can be non-zero, but not both. To be precise: both can be zero. Most MIP solvers support SOS1 sets.

Finally, as stated in the comments, an alternative formulation would be \[\begin{align} & d'_{i,k} \ge \sum_j \left( \bar{x}_{k,j} - p_{i,j} \right)^2 \\ & x_{i,k}=1 \Rightarrow d_{i,k} \ge  d'_{i,k} \end{align}\] This comes at the expense of extra variables and constraints. Basically we added one indirection. With this formulation, we have a linear indicator constraint and a separate convex quadratic inequality. I assume the presolver will not presolve this away.

Error Messages


In my opinion well-formulated error messages are very important. As a user, we are already confused that the system is not doing what we want, so a confusing error message is just piling on. Let's see what we see when using a quadratic indicator constraint:

  • GAMS has poor support for indicators in general, but is doing really poorly here. It does not give an error message, but just gives a wrong solution. This is really bad (I reported this, so it will be fixed).
  • AMPL gives an error message that seems to come from outer space.
    CPLEX 12.9.0.0: logical constraint _slogcon[1] is not an indicator constraint.Let's blame the slogcons (???). Obviously, I have no constraint (or any other symbol) called slogcon.
  • Cplex's interactive optimizer is doing a better job:
    CPLEX Error  1605: Line 5: Illegal quadratic term in a constraint.
  • Cplex's OPL has a difficult time understanding the (convex) quadratic indicator constraint. It says:
    *** FATAL[ENGINE_002]: Exception from IBM ILOG CPLEX: CPLEX Error  5002: 'q1' is not convex. Obviously the constraint is convex, so there is a problem in how OPL generates the constraint (i.e. this is a bug). The error message just does not make sense. The message is also bad: I don't have a `q1` in the model. 

A better message would be something like:
Equation `e2`: quadratic terms in an indicator constraint are not supported. Please reformulate this constraint.

Basically you want an error message to describe the problem in a way that is understandable (even when the user is slightly panicked). And secondly, it is a good idea to tell the user what to do now. In this case you need to reformulate (this tells the user there is no need to search for some Cplex option that can help here). And, never, never mention the slogcons.

Of course, it would be even better if solvers would support quadratic indicator constraints. In design there is this concept of orthogonality: it is best to have as few exceptions as possible.

Sunday, July 7, 2019

Binary multiplication

We want to express \[\begin{align} & z = x \cdot y \\ & x,y,z \in \{0,1\}\end{align}\] in a linear fashion. There are two good reasons to do this. This linearization will allow us to use linear Mixed Integer Programming solvers instead of quadratic or MINLP solvers. In addition, the linearization can bypass any non-convexity in the original quadratic formulation. The linear formulation will buy us more accessible solvers, and possibly better performance.

As we are talking about binary variables, \(z = x \cdot y\) can be interpreted as \(z = x \textbf{ and } y\). We can resort to two standard formulations:

Formulation 1Formulation 2
\[\begin{align} & \color{darkred} z \le \color{darkred}x\\ & \color{darkred}z \le \color{darkred}y\\ & \color{darkred}z \ge \color{darkred}x+\color{darkred}y-1\\ & \color{darkred}x,\color{darkred}y \in \{0,1\} \\ & \color{darkred}z \in [0,1] \end{align}\] \[\begin{align} & \color{darkred} z \le \frac{\color{darkred}x+\color{darkred}y}{2}\\ & \color{darkred} z \ge \color{darkred}x+\color{darkred}y-1\\ & \color{darkred}x,\color{darkred}y,\color{darkred}z \in \{0,1\} \\ \end{align}\]

I always use the first form. But I quite often see the second formulation being used (e.g. [1]).

The second form can be interpreted as an aggregation of the first one by adding up \(z\le x\) and \(z \le y\).

There is a good reason to use the first form: it is tighter. For instance, the fractional value \((x,y) = (0, 1/2)\) yields \(z=0\) in the first formulation, but would allow \(z=0.25\) in the second formulation. The first formulation is so tight, we can relax \(z\in\{0,1\}\) to \(z\in[0,1]\).

My attempt to show graphically the difference:

Formulation 1
Formulation 2


Obvious formulation 1 seems to be the winner. But when I try this out on a model [2] I see:

Results for Cplex, default settings, 1 thread
SizeModel 1
Seconds/nodes
Model 2
Seconds/nodes
\(n=20,k=10\)2.9 / 31171.8 / 3018
\(n=25,k=5\)12 / 624511 / 7043
\(n=30,k=4\)15 / 526313 / 4471

Well, this does not seem to prove my point. My preferred formulation is actually slightly underperforming.

Let's look at a different solver to see if that proves my point.


Results for CBC, 1 thread
SizeModel 1
Seconds/nodes
Model 2
Seconds/nodes
\(n=20,k=10\)46 / 30821623 / 279364
\(n=25,k=5\)105 / 68601823 / 87396
\(n=30,k=4\)144 / 12008213 / 11336

These results look quite different. First, obviously CBC is slower than Cplex. No surprise there. But also we see formulation 1 is much better than formulation 2. This is more like I expected beforehand. I suspect the cuts produced by Cplex eliminated the advantage of formulation 1. We see that more often: some modeling tricks are becoming less important as solvers are getting smarter.

Notes:

  1. The meaning of \(n\) and \(k\) for the size of the problem is explained in [2].
  2. The variables \(z\) were declared as binary both in models 1 and 2.


References



Sunday, June 30, 2019

Maximum Dispersion


Problem statement


Given \(n\) points with their distances  \(d_{i,j}\), select \(k\) points such that the sum of the distances between the selected points is maximized [1].

A simple MIQP (Mixed-Integer Quadratic Programming) model is:


Non-convex MIQP Model
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}x_{i} \color{darkred}x_{j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \end{align}\]

Notice we only need to look at distances \(d_{i,j}\) with \(i\lt j\) as we can assume symmetry. If not, just make \(d_{i,j}\) symmetric by: \[d_{i,j} = \frac{d_{i,j} + d_{j,i}}{2}\]

As we shall see this is not such an easy problem to solve to optimality.

This problem is called the \(p\)-dispersion-sum problem.

Example data


I generated random \(n=50\) \(2d\) points:


----     24 PARAMETER coord  random coordinates

              x           y

i1        1.717       8.433
i2        5.504       3.011
i3        2.922       2.241
i4        3.498       8.563
i5        0.671       5.002
i6        9.981       5.787
i7        9.911       7.623
i8        1.307       6.397
i9        1.595       2.501
i10       6.689       4.354
i11       3.597       3.514
i12       1.315       1.501
i13       5.891       8.309
i14       2.308       6.657
i15       7.759       3.037
i16       1.105       5.024
i17       1.602       8.725
i18       2.651       2.858
i19       5.940       7.227
i20       6.282       4.638
i21       4.133       1.177
i22       3.142       0.466
i23       3.386       1.821
i24       6.457       5.607
i25       7.700       2.978
i26       6.611       7.558
i27       6.274       2.839
i28       0.864       1.025
i29       6.413       5.453
i30       0.315       7.924
i31       0.728       1.757
i32       5.256       7.502
i33       1.781       0.341
i34       5.851       6.212
i35       3.894       3.587
i36       2.430       2.464
i37       1.305       9.334
i38       3.799       7.834
i39       3.000       1.255
i40       7.489       0.692
i41       2.020       0.051
i42       2.696       4.999
i43       1.513       1.742
i44       3.306       3.169
i45       3.221       9.640
i46       9.936       3.699
i47       3.729       7.720
i48       3.967       9.131
i49       1.196       7.355
i50       0.554       5.763


Let's try to find \(k=10\) points that are most dispersed. The distance matrix is formed by calculating Euclidean distances.

10 out of 50 most dispersed points


Solve non-convex problem


The above MIQP problem is non-convex. We can solve the non-convex MIQP problem in different ways:

  • Throw this into a global solver such as Baron, Couenne or Antigone
  • Use a solver like Cplex (option qtolin) or Gurobi (option preQlinearize) that can linearize the problem automatically for us. See further down how we can do this ourselves.
  • Instruct Cplex to use a QP formulation (option qtolin=0) and tell it to use the global QP solver (option optimalitytarget=3)

Convexification of the quadratic model


There is a trick to make this problem convex. For maximization, we require that the \(Q\) matrix is negative definite.  In our case the \(Q\) matrix is our distance matrix. The following algorithm will make the problem convex:

  1. Calculate the largest eigenvalue \(\lambda_{max}\) of the distance matrix \(0.5 D\). We multiply by 0.5 because we only use half of the matrix in the objective to prevent double counting.
  2. If  \(\lambda_{max} \lt 0 \): we are done (matrix is negative-definite)
  3. Form the objective: \[\max \sum_{i \lt j}  d_{i,j} x_{i} x_{j} - \lambda_{max} \sum_i x_i^2 + \lambda_{max} \sum_i x_i\] 
This essentially imposes a possible large diagonal perturbation of the \(Q\) matrix. We compensate with a linear term. This uses the fact that \(x_i = x_i^2\) when \(x_i \in \{0,1\}\).  

Linearization 


The quadratic model is easily linearized by observing that a well-known trick to handle \(y_{i,j} = x_i x_j\) is to write: \[\begin{align} & y_{i,j} \le x_i \\ & y_{i,j} \le x_j \\ & y_{i,j} \ge x_i +x_j-1\\& y_{i,j}, x_i \in \{0,1\}\end{align}\] A complete model can look like:


Linearized Model 1
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in [0,1] \end{align}\]

There are two things that may need some attention:

  • The inequality \(y_{i,j}\ge x_i +x_j -1\) was dropped. The objective will take care of this.
  • The variables \(y_{i,j}\) were relaxed to be continuous between 0 and 1. The \(y\) variables will be automatically integer (well, where it matters). Often models with fewer integer variables are easier to solve. Modern solvers, however, may reintroduce binary variables for this particular model. In Cplex you can see a message like shown below.


Tried aggregator 1 time.
MIP Presolve eliminated 1 rows and 1 columns.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 50 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.06 sec. (2.79 ticks)
Found incumbent of value 0.000000 after 0.06 sec. (4.54 ticks)
Probing time = 0.02 sec. (0.51 ticks)
Tried aggregator 1 time.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 1275 binaries, 0 generals, 0 SOSs, and 0 indicators.


A tighter linearization


In [2] a tighter formulation is proposed:


Linearized Model 2
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ &  \sum_{i\lt j} \color{darkred}y_{i,j} + \sum_{i\gt j} \color{darkred}y_{j,i} = (\color{darkblue} k-1) \color{darkred}x_j && \forall j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

Essentially, we multiplied the constraint \(\sum_i x_{i} = k\) by \(x_j\), and added these as constraints. The derivation is as follows: \[\begin{align} & \left( \sum_i x_i\right) x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} x_i x_j +  x_j^2 + \sum_{i\gt j} x_i x_j =  k x_j\\ \Rightarrow & \sum_{i\lt j} y_{i,j} + \sum_{i\gt j} y_{j,i} =  (k-1) x_j \end{align}\] We used here that \(x_j^2 = x_j\).

My intuition is as follows. If \(x_j=1\) then exactly \(k-1\) other \(x_i\)'s should be 1. That means \(k-1\) \(y_{i,j}\)'s should be 1. As we only use the upper-triangular part, the picture becomes:

Picture of y layout
Given that \(x_j=1\), we need to have \(k-1\) ones in the orange region.


Aggregation


We can aggregate the previous cut, which leads to: \[\sum_{i\lt j} y_{i,j} = \frac{k(k-1)}{2}\] The advantage of this version is that we only need one extra constraint [4]:


Linearized Model 3
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ &  \sum_{i\lt j} \color{darkred}y_{i,j} = \frac{ \color{darkblue}k (\color{darkblue} k-1)}{2} \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

The intuition is easy: if we have \(k\) \(x_j\)'s equal to one, we need to have \(k(k-1)/2\) \(y_{i,j}\)'s equal to one.

Numerical results


I tried these formulations using Cplex with a time limit of 1800 seconds (half an hour).


ModelTimeObjGapNotes
Non-convex MIQP1800332.139387.64%Solve as quadratic model.
options: qtolin=0, optimalitytarget=3
Non-convex MIQP1800332.912957.65%Automatically converted to linear model.
options: qtolin=1
Convex MIQP 1800332.912992.22%option: qtolin=0
MIP 1 1800332.912957.65%Defaults
MIP 2 8332.9129optimalDefaults (roughly same performance whether or not including \(y_{i,j}\ge x_i+x_j-1\))
MIP 3 400332.9129optimalDefaults. Excludes \(y_{i,j}\ge x_i+x_j-1\).
MIP 3 11332.9129optimalDefaults. Includes \(y_{i,j}\ge x_i+x_j-1\).


Just one observation, but I think these results are representative for other (small) instances.

We dropped the constraint \(y_{i,j}\ge x_i+x_j-1\) from MIP model 1: the objective will push \(y\) upwards on its own. For models MIP 2 and 3 it is wise to reintroduce them.

It is noted that although the gaps are terrible for all models without extra cuts. However, some of these methods find the best solution very quickly. They are just not able to prove optimality in a reasonable time. Here is an example:


        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

      0     0     1235.6817    50                   1235.6817        7         
*     0+    0                          332.9129     1235.6817           271.17%
Found incumbent of value 332.912894 after 0.02 sec. (4.67 ticks)
      0     2     1235.6817    50      332.9129     1162.0829        7  249.07%
Elapsed time = 0.03 sec. (11.72 ticks, tree = 0.02 MB, solutions = 1)


Better dispersion


From the picture of the solution (earlier in this post), we see that this model does not prevent selected points to be close to each other.  A better model may be to maximize the minimum distance between selected points. This can be modeled as:


Maximize Minimum Distance
\[\begin{align} \max\> & \color{darkred} {\Delta} \\ & \color{darkred} \Delta \le \color{darkblue} d_{i,j} + \color{darkblue} M (1- \color{darkred}x_{i} \color{darkred}x_{j})  && \forall i\lt j \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \\ \end{align}\]


Here \(M\) is a large enough constant, e.g. \[M = \max_{i\le j} d_{i,j}\] The model can be easily linearized by formulating the distance constraint as  \[ \Delta \le  d_{i,j} +  M (1- x_{i}) + M(1- x_{j})  \] This model quickly solves our 10 out of 50 problem. It gives as solution:

Maximization of minimum distance

This model has as disadvantage that it does not care about points being closer to each other as long as they are further away than the minimum. For this example, visual inspection seems to indicate this model does good job.


Conclusion


The \(p\)-dispersion-sum problem is very difficult to solve to optimality, even for very small data sets. Extra cuts can enormously help the linearized version of the model. A main drawback is that the optimal solutions do not provide us with well-dispersed points (points can be very close).

The maximin model solves much easier, and it gives better dispersion: selected points are never close to each other.

References


  1. How to select n objects from a set of N objects, maximizing the sum of pairwise distances between them, https://stackoverflow.com/questions/56761115/how-to-select-n-objects-from-a-set-of-n-objects-maximizing-the-sum-of-pairwise
  2. Warren P. Adams and Hanif D. Sherali, A Tight Linearization and an Algorithm for Zero-One Quadratic Programming Problems, Management Science, Vol. 32, No. 10 (Oct., 1986), pp. 1274-1290
  3. Michael J. Kuby, Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems, Geographical Analysis, vol. 19, pp.315-329, 1987.
  4. Ed Klotz, Performance Tuning for Cplex’s Spatial Branch-and-Bound Solver for Global Nonconvex (Mixed Integer) Quadratic Programs, http://orwe-conference.mines.edu/files/IOS2018SpatialPerfTuning.pdf

Friday, June 21, 2019

Special assignment problem


Problem Description


This is adapted from [1].

Assume we have data \(a_i\) and \(b_j\):


----     11 PARAMETER a  

i1  0.172,    i2  0.843,    i3  0.550,    i4  0.301,    i5  0.292,    i6  0.224,    i7  0.350,    i8  0.856,    i9  0.067
i10 0.500


----     11 PARAMETER b  

j1  0.998,    j2  0.621,    j3  0.992,    j4  0.786,    j5  0.218,    j6  0.676,    j7  0.244,    j8  0.325,    j9  0.702
j10 0.492,    j11 0.424,    j12 0.416,    j13 0.218,    j14 0.235,    j15 0.630


Try to find an assignment of each \(a_i\) to a unique \(b_j\) such that the quantities \[ \frac{a_i}{b_j}\] are as close to each other as possible.

This is more or less an assignment problem. The objective to make all ratios \(a_i/b_j\) as equal as possible, makes it interesting. I'll discuss some formulations for this problem.

A first model


We first introduce assignment variables: \[x_{i,j} = \begin{cases} 1 & \text{if $a_i$ is assigned to $b_j$} \\ 0 & \text{otherwise}\end{cases}\] The standard assignment constraints apply \[\begin{align}&\sum_i x_{i,j}\le 1&&\forall j \\&\sum_j x_{i,j} = 1&&\forall i  \end{align}\] This is an "unbalanced" assignment: we have more \(j\)'s than \(i\)'s.

To model "all about equal", we introduce a continuous variable \(c\) that represents the common value. I.e. \[\frac{a_i}{b_j} \approx c \>\>\text{for all  $(i,j)$ such that $x_{i,j}=1$}\]

The objective is \[\min \sum_{i,j}  \left[ x_{i,j} \left( \frac{a_i}{b_j} - c\right) \right]^2 \] or  \[\min \sum_{i,j} x_{i,j} \left( \frac{a_i}{b_j} - c\right)^2 \]

A complete model can look like:

MINLP Model
\[\begin{align} \min & \sum_{i,j} \color{darkred}x_{i,j} \left( \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - \color{darkred}c\right)^2 \\ & \sum_j \color{darkred} x_{i,j} = 1 &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \le 1 &&\forall j\\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}c \text{ free} \end{align}\]


When we run this model, using an MINLP solver, we see:


----     33 VARIABLE x.L  assignment

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        1.000
i2                    1.000
i3                                1.000
i4                                                                                                        1.000
i5                                                                                                                    1.000
i6                                                                    1.000
i7                                                                                            1.000
i8        1.000
i9                                            1.000
i10                                                                               1.000


----     33 VARIABLE c.L                   =        0.695  common value
            VARIABLE z.L                   =        0.201  objective


The assignment can be depicted as follows.

Solution. Matrix values are a(i)/b(j).
In the above matrix the cell values are \[\text{cell}_{i,j} = \frac{a_i}{b_j}\]

A quadratic model


We can reformulate this MINLP model into a quadratic model by introducing variables  \[y_{i,j} = x_{i,j} c\] This non-linear constraint can be linearized by stating \[\begin{align} & x_{i,j} = 0 \Rightarrow y_{i,j} = 0\\ &x_{i,j} = 1 \Rightarrow y_{i,j} = c\end{align}\] With these \(y\) variables we can form an objective: \[\sum_{i,j}  \left( x_{i,j}  \frac{a_i}{b_j} - y_{i,j} \right)^2  \] Unfortunately, this is a non-convex objective. There are not many non-convex MIQP solvers around (Cplex has one). This model gives the same solutions as the MINLP model above. I am not showing the results here, as they are identical to the ones in the previous section.


Non-convex MIQP Model
\[\begin{align} \min & \sum_{i,j} \left( \color{darkred}x_{i,j} \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - \color{darkred}y_{i,j}\right)^2 \\ & \sum_j \color{darkred} x_{i,j} = 1 &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \le 1 &&\forall j\\ & \color{darkred} x_{i,j} = 0 \Rightarrow \color{darkred}y_{i,j} = 0 &&\forall i,j\\ & \color{darkred} x_{i,j} = 1 \Rightarrow \color{darkred}y_{i,j} = c &&\forall i,j\\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}y_{i,j} \ge 0 \\ & \color{darkred}c \ge 0 \end{align}\]

We assumed here positive fractions \(a_i/b_j \gt 0\).This allows us to say \(c\ge 0\) and \(y_{i,j}\ge 0\).

A linear model


We can approximate the quadratic weighting of the residuals by absolute values: \[\sum_{i,j}  \left| x_{i,j} \left( \frac{a_i}{b_j} - c \right) \right| \] The first thing to is to linearize the term \(x_{i,j} c\). As in the quadratic model, we introduce variables \[y_{i,j} = x_{i,j} c\] We can form the implications: \[\begin{align} & x_{i,j} = 0 \Rightarrow y_{i,j} = 0\\ &x_{i,j} = 1 \Rightarrow y_{i,j} = c\end{align}\] Most solvers support indicator constraints, which allows us to implement these implications as is. If we have a solver without this, we can formulate this as a bunch of big-\(M\) constraints: \[\begin{align} & - M x_{i,j} \le y_{i,j} \le M x_{i,j} \\ & c-M(1-x_{i,j}) \le y_{i,j} \le c+M(1-x_{i,j})\end{align}\] From the data we can assume \(a_{i}\ge 0\) and \(b_{j}\gt 0\). We can exploit this: \[\begin{align} & 0 \le y_{i,j} \le M_1 x_{i,j} \\ & c-M_2(1-x_{i,j}) \le y_{i,j} \le c+M_2(1-x_{i,j})\end{align}\] We can think a bit about good values for \(M_1\) and \(M_2\). I suggest: \[M_1 = M_2 = \max c = \max_{i,j} \frac{a_i}{b_j}\]

The complete model can look like

MIP Model
\[\begin{align}
\min & \sum_{i,j} \color{darkred}r_{i,j} \\
        & \sum_j \color{darkred} x_{i,j} = 1 && \forall i\\
        & \sum_i \color{darkred} x_{i,j} \le 1 && \forall j\\
        &  -\color{darkred}r_{i,j} \le \color{darkred} x_{i,j} \frac{\color{darkblue} a_i}{\color{darkblue} b_j} - \color{darkred} y_{i,j} \le \color{darkred}r_{i,j} &&\forall i,j\\
       &  \color{darkred} y_{i,j} \le \color{darkblue} M_1 \color{darkred} x_{i,j}  &&\forall i,j\\ & \color{darkred} c - \color{darkblue} M_2 (1- \color{darkred} x_{i,j}) \le \color{darkred} y_{i,j} \le \color{darkred} c + \color{darkblue} M_2 (1- \color{darkred} x_{i,j})&&\forall i,j \\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}c \ge 0 \\ & \color{darkred}y_{i,j} \ge 0\\ & \color{darkred}r_{i,j} \ge 0\\ \end{align}\]


The results look like:


----     72 VARIABLE x.L  assignment

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        1.000
i2                    1.000
i3                                1.000
i4                                                                                                        1.000
i5                                                                                                                    1.000
i6                                                                    1.000
i7                                                                                            1.000
i8        1.000
i9                                            1.000
i10                                                                               1.000


----     72 VARIABLE c.L                   =        0.711  common value

----     72 VARIABLE y.L  c*x(i,j)

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        0.711
i2                    0.711
i3                                0.711
i4                                                                                                        0.711
i5                                                                                                                    0.711
i6                                                                    0.711
i7                                                                                            0.711
i8        0.711
i9                                            0.711
i10                                                                               0.711


----     72 VARIABLE r.L  abs(residuals)

             j1          j3          j4          j5          j7          j8          j9         j10         j12

i1                                                        0.006
i2                    0.139
i3                                0.010
i5                                                                                                        0.009
i6                                                                    0.021
i7                                                                                      6.137042E-4
i8        0.147
i9                                            0.402
i10                                                                               0.002

The model results are very close. The assignments are the same. Just the \(c\) value and resulting sum of squared or absolute residuals are somewhat different.


----     77 PARAMETER sumdev  sum of squared/absolute deviations

                  dev^2       |dev|           c

minlp model       0.201       0.784       0.695
mip model         0.204       0.737       0.711


A different approach


We can look at the problem in a different way. Instead of minimize the spread around a central value \(c\), we can minimize the bandwidth directly. I.e. we can write: \[\begin{align} \min\> & \mathit{maxv} - \mathit{minv} \\ & \mathit{minv} \le \frac{a_i}{b_j} && \forall x_{i,j}=1\\& \mathit{maxv} \ge \frac{a_i}{b_j} && \forall x_{i,j}=1 \end{align}\] These constraints can be implemented as indicator constraints \[\begin{align} & x_{i,j}=1 \Rightarrow \mathit{minv} \le \frac{a_i}{b_j}\\ & x_{i,j}=1 \Rightarrow \mathit{maxv} \ge \frac{a_i}{b_j}\end{align}\] If we don't have indicator constraints available we need to resort to big-\(M\) constraints:

Alternative MIP Model
\[\begin{align} \min\> & \color{darkred}{\mathit{maxv}} - \color{darkred}{\mathit{minv}} \\ & \sum_j \color{darkred} x_{i,j} = 1 && \forall i \\ & \sum_i \color{darkred} x_{i,j} \le 1 && \forall j \\ & \color{darkred}{\mathit{minv}} \le \frac{\color{darkblue}a_i}{\color{darkblue}b_j} + M (1-\color{darkred} x_{i,j}) && \forall i,j \\ & \color{darkred}{\mathit{maxv}} \ge \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - M (1-\color{darkred} x_{i,j}) && \forall i,j \\ & \color{darkred}x_{i,j} \in \{0,1\} \\ & \color{darkred}{\mathit{maxv}},\color{darkred}{\mathit{minv}} \text{ free} \end{align}\]


The solution looks like:


----    103 VARIABLE x.L  assignment

             j1          j2          j3          j4          j5          j6          j9         j11         j14         j15

i1                                                                                                        1.000
i2                                1.000
i3                                                                                1.000
i4                                                                                                                    1.000
i5                                                                    1.000
i6                                                                                            1.000
i7                    1.000
i8        1.000
i9                                                        1.000
i10                                           1.000


----    103 VARIABLE minv.L                =        0.308  minimum value
            VARIABLE maxv.L                =        0.858  maximum value


A picture of this solution:

Minimize bandwidth solution

The disadvantage of this method is that it does not care about assignments as long as they are inside our optimal bandwidth. We can see this if we recalculate an optimal central \(c\) value afterwards. The results with this reconstructed \(c\) value:



----    114 PARAMETER sumdev  sum of squared/absolute deviations

                  dev^2       |dev|           c

minlp model       0.201       0.784       0.695
mip model         0.204       0.737       0.711
mip model 2       0.313       1.548       0.617

Indeed, we see that the total sum of squared or absolute deviations is not as good as we saw before. The same thing can be seen by just calculating the standard deviation for the selected ratios \(a_i/b_j\). This gives:


----    130 PARAMETER stdev  

minlp model 0.149,    mip model   0.149,    mip model 2 0.186

Again we see this last approach leads to more variability.

Distribution of selected a(i)/b(j)

In this above picture we see the results of this second MIP model not having an incentive to keep values close to each other, besides that they should be within \([\mathit{minv},\mathit{maxv}]\). This model is simpler than the earlier MIP model, but we pay a price: the solution is not as highly concentrated around the central point.

Non-unique \(b_j\)


If we allow the \(b_j\) to be reused, only a small change in the models is needed. We just need to drop the constraint \[\sum_i x_{i,j} \le 1 \] When we do this, we may see a solution like:

Solution. Allow a column to be selected multiple times.

Conclusion


This is turned out to be an interesting problem. A high-level MINLP problem is presented first. We can linearize the model into a MIP, but this becomes a bit messy. Modern MIP solvers offer tools (indicator constraints, built-in absolute value function) that can help here. Finally an alternative, simpler MIP model is proposed. However, this model produces solutions that are more dispersed. 

References