Yet Another Math Programming Consultant: August 2019

Wednesday, August 28, 2019

Octeract

I don't know what or who this is.

This seems to be a parallel deterministic solver for non-convex MINLPs. Some things I noticed:

!np.easy : cute (but nonsense of course: some problems just remain difficult).
"The first massively parallel Deterministic Global Optimization solver for general non-convex MINLPs."
Symbolic manipulation: like some other global solvers they need the symbolic form of the problem so they can reason about this. I.e. no black-box problems.
Support for AMPL and Pyomo
"Octeract Engine has implementations of algorithms that guarantee convergence to a global optimum in finite time. The correctness of the majority of the calculations are ensured through a combination of interval arithmetic and infinite precision arithmetic."
It looks like the benchmarks [3] are against itself (so always a winner).
"Massively parallel" makes me think about thousands of cores. The benchmarks do not seem to report anything like this (mostly 1,4 and 8 cores).
I don't see any names on the web site. The About Company section is unusually vague. I received some more links with additional background info [4,5].

Some of the competing solvers are Baron, Couenne, and Antigone.

References

https://octeract.com/
Manual: https://octeract.com/wp-content/uploads/2019/08/user_manual.pdf
Benchmarks: https://octeract.com/benchmarks/
Presentation: https://www.youtube.com/watch?v=ayZ3iHmQUak
PhD thesis: https://spiral.imperial.ac.uk/bitstream/10044/1/45359/1/Kazazakis-N-2017-PhD-Thesis.pdf

Monday, August 26, 2019

Python-MIP

This is another modeling tool for Python.

There are quite a few modeling tools available for Python: Pyomo, PuLP, and most commercial LP/MIP solvers come with some Python modeling layer.

This is what caught my eye when reading about Python-MIP:

The name is rather unimaginative.
Looks like the authors are from Brazil.
Supported solvers are CBC and Gurobi.
The Python-MIP is compatible with the just-in-time compiler PyPy, which can lead to substantial performance improvements.
It is claimed that with PyPy jit, python-MIP can be 25 times as fast than the Gurobi modeling tool.
There are some interesting facilities supported by Python-MIP:

Cuts can be provided using a call-back mechanism
Support for MIPSTART (initial integer solution)
Solution pool

Interesting question

Related to Python-MIP, in [3] an interesting question came up. The value of an integer variable is often slightly non-integer. E.g. something like 0.0000011625. This is the result of the integer feasibility tolerance that a solver applies.

Background: during the MIP search quite a lot of integer variables are automatically (or by accident) integer valued. To exploit this, and not wasting time branching on these variables, a MIP solver wants to consider variables that are close to being integer as integer. Basically, skipping these variables when branching.

In the discussion [3] the remark is made:

Is rounding integer solutions automatically a good idea? No MIP solver or modeling system is doing this AFAIK The exceptions are modeling tools mainly targeting Constraint Programming solvers. Here are some arguments against this rounding strategy.

Rounding integer solutions can lead to larger infeasibilities. With some aggressive presolve/scaling these infeasibilities can sometimes be large after postsolve/unscaling. Also: some equations may have long summations of binary variables. This would accumulate a lot of rounding errors. And then there are these big-M constraints....

It also means that using the steps:

solve
fix solution (or fix integers) to optimal solution
solve

may lead to "feasible" for the first solve, but "infeasible" for the second solve. E.g. when we want duals for the fixed LP we use this "fix integers" step.

Safer would be to tighten the integer feasibility tolerance. Cplex even allows epint=0 (epint is Cplex's integer feasibility tolerance). Of course tightening the integer feasibility tolerance will likely lead to longer solution times.

Indeed, modeling systems and solvers currently handle this by offloading the problem to the user. The solver is probably the right place to deal with this. Not sure developers are eager to work on this. Taking responsibility by simple-minded rounding may be asking for more problems than it solves.

On the other hand, these slightly fractional values certainly cause confusion. Especially for beginners in MIP modeling. This is the main argument in favor of automatic rounding.

So the question remains: is rounding integer variables a good idea?

References

Wednesday, August 21, 2019

R/Python + C++

In some recent projects, I was working on using algorithms implemented in C++ from R and Python. Basically the idea is: Python and R are great languages for scripting but they are slow as molasses. So, it may make sense to develop the time consuming algorithms in C++ while driving the algorithm from R or Python.

R and C++: Rcpp.

The standard way to build interfaces between R and C++ code is to use Rcpp [1,2].

It is possible to directly interface R with low-level C code, This will require a lot of code and knowledge of R internals. Rcpp automates a lot of this. E.g. Rcpp will take care of translating an R vector into a C++ vector.

Rcpp supports both small fragments of C++ code passed on as an R string to more coarse-grained file based approach [3]. For Windows, you need to download the GNU compilers [4].

If you are new to both Rcpp and to building your own R packages [5] things may be a bit overwhelming.

Rstudio can help a lot. It supports a lot of very useful tasks:

Syntax coloring for C++ code.
Building projects.
Git version control.
Documentation tools (rmarkdown and bookdown). My documentation is also a good test: it executes almost all of the code when building the document.

Editing C++ code in RStudio

Basically I never have to leave RStudio.

I have added an alternative driver file for my C++ code so I can debug it in Visual Studio. I used it only a few times: most of the time I just used RStudio.

Python and C++: pybind11

pybind11 [6] is in many respects similar to Rcpp, although it requires a little bit more programming to bridge the gap between Python and C++.

In the beginning of the above Youtube video [7], the presenter compares pybind11 with some of the alternatives:

SWIG: author of SWIG says: don't use it
ctypes: call c functions but not c++
CFFI: call c functions
Boost.Python: support for older C++ standards, but not much maintained
pybind11: modern

As with Rcpp, calling the compiler is done through running a build or setup script. For Rcpp I used the GNU compilers, while pybind11/pip install supports the Visual Studio C++ compiler. This also means that if you have little experience with both pybind11 and creating packages, the learning curve may be steep.

References

http://www.rcpp.org
Dirk Eddelbuettel, Seamless R and C++ Integration with Rcpp, Springer, 2013
Chapter: Rewriting R code in C++, https://adv-r.hadley.nz/rcpp.html
Hadley Wickham, R packages, O'Reilly, 2015
https://cran.r-project.org/bin/windows/Rtools/
https://pybind11.readthedocs.io/en/master/
Robert Smallshire, Integrate Python and C++ with pybind11, https://www.youtube.com/watch?v=YReJ3pSnNDo

Thursday, August 15, 2019

Scheduling teachers

From [1]:

I am trying to find a solution to the following problem:

There are 6 teachers

Each teacher can only work for 8 hours a day

Each teacher must have a 30 minute break ideally near the middle of his shift

The following table shows the number of teacher needed in the room at certain times:

7am - 8am 1 teacher

8am - 9am 2 teacher

10am - 11am 5 teacher

11am - 5pm 6 teacher

5pm - 6pm 2 teacher

What is a good way to solve this (ideally in python and Google OR-Tools) ?

Thank you

Initial analysis

From the demand data we see that we need all 6 teachers from 11am-5pm working without lunch break. This is 6 hours. So there is no way to allow a lunch break near the middle of the shift.

We have more problems. The picture of the demand data indicates we have no demand between 9am and 10am. That does not look right. Bad data is fact of life. Optimization models are especially sensitive to bad data.

I believe that looking critically at your data is essential for successful optimization applications. You can learn a lot from just a bit of staring.

Alternative problem 1

We can ask a few different questions. If we only allow shifts of the form: 4 hours work, 0.5 hour lunch, 4 hours work (i.e. lunch perfectly in the middle of the shift), how many teachers do we need to meet the demand. To model this, we can assume time periods of half an hour. The number of possible shifts is small:

With an enumeration of the shifts, we can model this as a covering problem:

Covering Model
\[\begin{align} \min\> &\color{darkblue}z= \sum_s \color{darkred} x_s \\ & \sum_{s\|\color{darkblue}{\mathit cover}(s,t)} \color{darkred} x_s \ge \color{darkblue}{\mathit demand}_{t} && \forall t \\ &\color{darkred} x_i \in \{0,1,2,\dots\}\end{align} \]

Here $\mathit{cover}(s,t)=\text{True}$ if shift $s$ covers time period $t$. If we would interpret $\mathit cover(s,t)$ as a binary (data) matrix \[\mathit cover_{s,t} = \begin{cases} 1 & \text{if shift $s$ covers time period $t$}\\ 0 & \text{otherwise}\end{cases}\] we can also write:

Covering Model (alternative interpretation)
\[\begin{align} \min\> &\color{darkblue}z= \sum_s \color{darkred} x_s \\ & \sum_s \color{darkblue}{\mathit cover}_{s,t} \cdot \color{darkred} x_s \ge \color{darkblue}{\mathit demand}_{t} && \forall t \\ &\color{darkred} x_i \in \{0,1,2,\dots\}\end{align} \]

Our new assumptions are:

we only allow shifts with a lunch break in the middle
we add to our demand data: 4 teachers needed between 9am and 10am

After solving this easy MIP model we see:

----     55 VARIABLE x.L  number of shifts needed

shift1 2.000,    shift4 2.000,    shift5 2.000,    shift6 2.000


----     55 VARIABLE z.L                   =        8.000  total number of shifts

I.e. we need 8 teachers to handle this workload.

The picture shows we are overshooting demand in quite a few periods. Note: this picture illustrates the left-hand side (orange bars) and right-hand side (blue line) of the demand equation in the Covering Model.

Alternative problem 2

Lets make things a bit more complicated. We allow now the following rules for a single shift:

The work period before the lunch bread is between 3 and 5 hours (or between 6 and 10 time periods)
There is a lunch break of 0.5 or 1 hour.
After lunch there is another work period of between 3 and 5 hours. The total number of working hours is 8.

When we enumerate the shifts, we see:

We now have 55 different shifts to consider.

The results look like:

----     72 VARIABLE x.L  number of shifts needed

shift1  1.000,    shift22 1.000,    shift26 1.000,    shift39 1.000,    shift49 1.000,    shift52 1.000
shift55 1.000


----     72 VARIABLE z.L                   =        7.000  total number of shifts

We see the number of teachers needed is 7. We are closer to the demand curve:

We see that if we add more flexibility we can do a bit better. Achieving 6 teachers is almost impossible. We would need to introduce shifts like: work 2 hours, lunch, work 6 hours. The teachers union would object.

It is noted there are quite a few other methods to solve models like this where we don't need to enumerate all possible shifts [2,3]. For larger problems it may not be feasible to employ the shift enumeration scheme we used here.

References

Constraint programming scheduling problem, https://stackoverflow.com/questions/57473085/constraint-programming-scheduling-problem
Employee Scheduling II: Column Generation, http://yetanothermathprogrammingconsultant.blogspot.com/2017/01/employee-scheduling-ii-column-generation.html
Employee Scheduling IV: Direct Optimization, http://yetanothermathprogrammingconsultant.blogspot.com/2017/03/employee-scheduling-iv-direct.html

Wednesday, August 7, 2019

Finding the central point in a point cloud

Problem

The problem is easy to describe: given the coordinates of points $p_i$, find the central point $x$ that minimizes the sum of the distances from $x$ to $p_i$ [1]. This point is sometimes called the geometric median [2].

Data

We just use some random data on the unit square $[0,1]\times[0,1]$:

----     13 PARAMETER p  points (randomly generated)

                  x           y

point1   0.17174713  0.84326671
point2   0.55037536  0.30113790
point3   0.29221212  0.22405287
point4   0.34983050  0.85627035
point5   0.06711372  0.50021067
point6   0.99811763  0.57873338
point7   0.99113304  0.76225047
point8   0.13069248  0.63971876
point9   0.15951786  0.25008053
point10  0.66892861  0.43535638
point11  0.35970027  0.35144137
point12  0.13149159  0.15010179
point13  0.58911365  0.83089281
point14  0.23081574  0.66573446
point15  0.77585761  0.30365848
point16  0.11049229  0.50238487
point17  0.16017276  0.87246231
point18  0.26511455  0.28581432
point19  0.59395592  0.72271907
point20  0.62824868  0.46379787
point21  0.41330699  0.11769536
point22  0.31421227  0.04655151
point23  0.33855027  0.18209959
point24  0.64572713  0.56074555
point25  0.76996172  0.29780586
point26  0.66110626  0.75582167
point27  0.62744750  0.28386420
point28  0.08642462  0.10251467
point29  0.64125115  0.54530950
point30  0.03152485  0.79236064
point31  0.07276700  0.17566105
point32  0.52563261  0.75020767
point33  0.17812371  0.03414099
point34  0.58513117  0.62122998
point35  0.38936190  0.35871415
point36  0.24303462  0.24642154
point37  0.13050280  0.93344972
point38  0.37993791  0.78340046
point39  0.30003426  0.12548322
point40  0.74887411  0.06923246
point41  0.20201556  0.00506586
point42  0.26961305  0.49985148
point43  0.15128587  0.17416945
point44  0.33063773  0.31690605
point45  0.32208696  0.96397664
point46  0.99360221  0.36990306
point47  0.37288857  0.77197833
point48  0.39668414  0.91309632
point49  0.11957773  0.73547889
point50  0.05541847  0.57629980
mean     0.39042709  0.47299042

The last entry labeled mean is just the average of the $x$- and $y$-coordinates.

NLP Model

A straightforward non-linear programming model can look like:

Unconstrained NLP Model
\[\min \sum_i \sqrt{\sum_c (\color{darkred}x_c-\color{darkblue}p_{i,c})^2 }\]

We use $c = \{x,y\}$, i.e. we have $x$ and $y$-coordinates. Note that we use $x$ in two different contexts: element of set $c$, being the $x$-coordinate, and the decision variable $x_c$.

We can use the mean as a very good starting point to help the NLP solver. I.e. \[x_c := \frac{\displaystyle\sum_{i=1}^n p_{i,c}}{n}\]

The picture below shows why the mean is such a good starting point:

Optimal center point is close to mean point

The numeric values are here:

----     45 PARAMETER results  x(center) vs mean

               x           y     sumdist

mean  0.39042709  0.47299042 18.02414861
x     0.37048298  0.43857572 17.96891985

The sumdist column shows the objective values for these two points.

This is an easy NLP problem. Most NLP solvers just need a few iterations. With a system like GAMS or AMPL we get exact gradients automatically. That is much preferable to finite differences which seems the prevalent method people use in an R or Python environment.

It was pointed out to me that the problem is not smooth: the gradient of $\sqrt{x}$ is not defined at zero, This can only happen when our point $x$ exactly coincides with a data point (i.e. all coordinates are exactly the same). We can protect against this by using: \[\min \sum_i \sqrt{\varepsilon+\sum_c (x_c-p_{i,c})^2 }\] for a very small constant $\varepsilon\gt 0$.

Cone programming I

The above problem can also be written as a cone programming problem. This will allow us to use a different class of solvers to work on this problem. Here we use CVXPY [3] to express the model. The Python code can look like:

import cvxpy as cp  

x = cp.Variable(2)   # center point

obj = cp.Minimize( cp.sum( [ cp.norm(x-p[i,:]) for i in range(N) ] ) )
prob = cp.Problem(obj)

objval = prob.solve(solver=cp.SCS, verbose=True)

This is very much a straight translation of our unconstrained NLP model. Although we only declared two $x$ variables, behind the scenes the model is blown up to a rather large one. We can see from the log:

----------------------------------------------------------------------------
    SCS v2.1.1 - Splitting Conic Solver
    (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 150
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 0, rho_x = 1.00e-03
Variables n = 52, constraints m = 150
Cones:  soc vars: 150, soc blks: 50

The generated SOCP (second order cone programming) model is larger, but also very sparse. The solver has no problem solving it very quickly. With SOCP solvers we usually don't worry about an initial point like we used in the NLP model.

Cone programming II

If we don't use a modeling tool that can do these transformations automatically, we can use a DIY approach. Second-order cone constraints can be stated as: \[||A_i x+b_i||_2 \le c_i^Tx + d_i \>\>\forall i\] This would imply that we can write our model as:

SOCP Model Attempt
\[\begin{align} \min & \sum_i \color{darkred} d_i \\ & \color{darkred} d_i^2 \ge \sum_c (\color{darkred}x_c-\color{darkblue}p_{i,c})^2 && \forall i \\ &\color{darkred} d_i \ge 0, \color{darkred} x_c \text{ free}\end{align} \]

Unfortunately this will yield a message like: CPLEX Error 5002: 'e(point1)' is not convex or Constraint 'e(point1)'(0) is not convex. Q should be positive semidefinite in a constraint with finite upper bound. We can repair this as follows:

Repaired SOCP Model
\[\begin{align} \min & \sum_i \color{darkred} d_i \\ & \color{darkred} d_i^2 \ge \sum_c \color{darkred}y_{i,c}^2 && \forall i \\ & \color{darkred} y_{i,c} = \color{darkred} x_c -\color{darkblue}p_{i,c} && \forall i,c \\ &\color{darkred} d_i \ge 0, \color{darkred} x_c \text{ free}, \color{darkred} y_{i,c} \text{ free}\end{align} \]

This now solves quickly. We can now understand that CVXPY did quite a few steps before passing the model to the solver. As argued in [4], it is much better if the modeling system takes care of these reformulations. Some of them are not immediate obvious, and hand-crafted reformulations can be error-prone.

The NLP model has just 2 variables $x_c$. This SOCP model has 152 variables. Using $n=50$ data points, we added $3n$ variables. When looking at the number of constraints we see a similar thing. The NLP model has no constraints, but this SOCP model has $3n=150$ constraints ($n$ of them cones).

Conclusion

The min sum distance problem has a simple NLP formulation which can be improved by using a good initial point. It can also be formulated as a SOCP problem. Using high-level modeling tools this is not difficult. Without automatic reformulations things become a bit less obvious.

References

The point that minimizes the sum of euclidean distances to a set of n points, https://stackoverflow.com/questions/57277247/the-point-that-minimizes-the-sum-of-euclidean-distances-to-a-set-of-n-points
Geometric median, https://en.wikipedia.org/wiki/Geometric_median
https://www.cvxpy.org/
Victor Zverovich, Robert Fourer, Automatic Reformulation of Second-Order Cone Programming Problems, https://ampl.com/MEETINGS/TALKS/2015_01_Richmond_2E.2.pdf