Monday, January 2, 2023

High Level MIP Modeling


Most LP/MIP modeling tools stay close to what is the underpinning of a Mixed-Integer Programming problem: a system of linear equations (equalities and inequalities) plus a linear objective. Examples are Pulp and GAMS. In Constraint Programming (CP), modeling tools need to provide access to higher-level global constraints. Without these global constraints (such as the famous all-different constraint), CP solvers would not perform very well. 

In some modeling tools for convex optimization models, we also encounter high-level constructs. One reason is that this makes it easier for the system to establish whether the problem is indeed convex. The underlying approach is called Disciplined Convex Programming (DCP). Tools like CVXPY have a wide array of functions for absolute values, norms, quadratic forms, etc.

Newer versions of AMPL also allow the use of high-level modeling constructs for MIP models. Here the goal is to provide more convenience to the modeler: write things in a more natural way. Let the modeling system take care of translation into linear equations. Of course, this will still allow you to do the reformulations yourself at the model level, and just write proper linear equations yourself. (As a side note, some constructs are not so easy to state as linear equations. Indicator constraints are a good example. These need syntax support - something that is sorely missing in PuLP and GAMS.)

The automatic translation can be a bit of a black box. It may or may not be easy to see what is actually passed on to the solver. Of course, some may argue, that you don't inspect the generated assembly code generated by compilers either. 

Let's have a look at a small example I discussed earlier in [1].


Problem


We want to form an integer sequence \(\color{darkred}x_i\) for \(i=0,\dots,n\). This sequence should obey the rule:

High-level Constraint
\[\begin{align}& \color{darkred}x_i = \left| \{ j: \color{darkred}x_j=i\} \right| \\ & \color{darkred}x_i \in \{0,1,2,\dots\} \end{align}\]

Here \(|S|\) indicates the cardinality of set \(S\) (i.e., the number of elements). A slight rewrite yields: \[\color{darkred}x_i = \sum_j \left(\color{darkred}x_j=i\right)\] This is a bit difficult to understand at first sight, due to its recursive definition. Here is a solution for \(n=4\): 

x = [2,1,2,0,0]
I.e., we have 2 zeros, 1 one, 2 twos. The values 3 and 4 don't occur.


Traditional MIP formulation


A traditional way to implement this in a MIP model is to introduce binary variables. This should not come as a surprise. In a MIP, when we want to do counting, we need binary variables. So we use \(\color{darkred}y_{i,j}\in \{0,1\}\), defined by: \[\color{darkred}y_{i,j}=1 \iff \color{darkred}x_i=j\] This leads to the linear model:

 
Linear MIP constraints
\[\begin{align}& \sum_j j\cdot\color{darkred}y_{i,j} = \sum_j \color{darkred}y_{j,i} && \forall i \\ & \sum_j \color{darkred}y_{i,j} = 1 && \forall i \\ & \color{darkred}y_{i,j}\in \{0,1\} \end{align}\]


We can recover \[\color{darkred}x_i := \sum_j j\cdot  \color{darkred}y_{i,j}\] This can be implemented in post-processing (outside the optimization model), or as accounting constraints inside the model. We also need to add a dummy objective function.

We can just consider \(\color{darkred}x_i\) as non-negative integers, or we can be more specific and use: \(\color{darkred}x_i \in \{0,\dots,n\}\). Note that the binary variable formulation simplicity uses the latter, as we have to provide a domain for the index \(j\).

The comments in [1] provide a good impetus to circle back, and look a bit more carefully at how this high-level constraint can be written down in different languages.


Constraint Programming Environments


In CP (Constraint Programming), it is quite common to operate on integer variables directly. The high-level constraint can be implemented almost directly:

OPL CP:
forall (i in I) x[i] == sum(j in I) (x[j]==i);

Minizinc:
forall (i in I) (x[i] = sum(j in I) (x[j] = i));

In the comment discussing the Minizinc model in [1], another, better-performing constraint is proposed:

array[I] of I: index = array1d(I, [i | i in I]);
include "globals.mzn";
constraint global_cardinality(x, index, x);

It performs better, but at the expense of readability [2]. I can easily understand mathematical formulation, even without much knowledge of Minizinc or Constraint Programming, but I would have to look up what this global constraint is doing exactly.  

Models have two different audiences. The first is obviously the modeling tool and solver. The second, at least as important, is the human reader. I think the mathematical notation using the sum is way easier to understand. Obviously, the solver could analyze the model and replace the summation with the cardinality constraint. We see this more and more in high-end tools and MIP solvers. The solver tries to understand the model structure and may apply automatic reformulations. Another argument for this is that there are just way too many global constraints: it is left to the modeler to remember them all. I count 121 different global constraints listed in the Minizinc documentation.


Automatic reformulations


AMPL can do some really interesting formulations. The high-level constraint can be implemented as:

con{i in I}: x[i] = count{j in I} (x[j] = i);

Interestingly, a new count operator is used here, instead of extending the sum.
Nerdy side note: if you use sum anyway, you will get a syntax error with the message "syntax error."  Such poor error messages are sometimes the result of using certain parser generators. Many compilers choose to implement hard-crafted parsers to improve error messaging.

If we use a MIP solver, AMPL cannot just pass this on to the solver. Rather it will need to apply some reformulation. Interestingly, what AMPL generates, depends on the exact declaration of the integer variable. If we specify \(\color{darkred}x_i \in \{0,1,\dots\}\), then AMPL will generate something like: \[\begin{align}&\color{darkred}x_i = \sum_j \color{darkred}\delta^=_{i,j} \\& \color{darkred}\delta^{\lt}_{i,j} + \color{darkred}\delta^=_{i,j} + \color{darkred}\delta^{\gt}_{i,j} \ge 1 \\ & \color{darkred}\delta^{\lt}_{i,j}=1 \Rightarrow  \color{darkred}x_j \le i-1 \\& \color{darkred}\delta^{=}_{i,j}=1 \Rightarrow  \color{darkred}x_j = i \\ & \color{darkred}\delta^{\gt}_{i,j}=1 \Rightarrow \color{darkred}x_j \ge i+1 \\ & \color{darkred}\delta^{\lt}_{i,j},\color{darkred}\delta^=_{i,j},\color{darkred}\delta^{\gt}_{i,j} \in \{0,1\}\end{align}\] I think I would use: \( \color{darkred}\delta^{\lt}_{i,j} + \color{darkred}\delta^=_{i,j} + \color{darkred}\delta^{\gt}_{i,j} = 1\) but the \(\ge 1\) is also fine as the indicator constraints make sure only one of them can be 1.

However, when we declare: \(\color{darkred}x_i \in \{0,\dots,n\}\), then AMPL will generate a similar thing as our linear MIP constraint. That is actually pretty cool. In this case, the difference in the generated MIP model does not really matter (Gurobi solves both cases in the presolve). But, in general, it may be better to be as tight as possible. So instead of

var x{I} integer >= 0;  (or even var x{I} integer;)

use

var x{I} in I;

The generated MIP model will be different and most likely more efficient if we use the latter declaration.

This is a good example where it can help to look a bit under the hood. Of course, this leads to the observation: these transformations are somewhat of a black box, and it is not always easy to assess that the modeler provided the best information. A slight difference in some detail in the AMPL code, may cause very different models to be generated. Without letting the modeler know. When specifying the linear MIP model with binary variables myself, I know better what the solver will receive.

One other remark. I noticed that the MIP model generated by AMPL relies on further presolving by the solver. AMPL also has its own presolver. I think the translation is done after AMPLs own presolving. 

Not all solvers support indicator constraints. I have not checked what AMPL does in the case of missing support for that. I assume some big-M constraints will be generated. It would be interesting to see what values for the big-M constants are used. (Solvers are very sensitive to that).

To see what AMPL generated, I let Gurobi write an LP file. I don't know if there is a better way to see what AMPL did. Digesting the LP file is a bit difficult. I don't know if there is a better way to get some feedback.


Conclusion


I think there are several good reasons to use high-level MIP formulations:
  • Improve readability,
  • Novice modelers can implement constructs that were otherwise beyond their reach,
  • Experienced modelers may use it for fast prototyping.

References


  1. A strange series, http://yetanothermathprogrammingconsultant.blogspot.com/2022/11/a-strange-series.html
  2. Global_cardinality, https://www.minizinc.org/doc-2.6.4/en/lib-globals-counting.html#mzn-ref-globals-counting-global-cardinality


Appendix: models


OPL CP Optimizer

 

using CP;

 

int n = 100;

range I = 0..n;

 

// dvar int x[I];

dvar int x[Iin I;

// slight performance difference

 

constraints {

  forall (i in Ix[i] == sum(j in I) (x[j]==i);

}

 

Minizinc/Gecode

 

int: n = 4;

set of int: I = 0..n;

 

array[I] of var I: x;

 

% Constraint variant 1: Individual sums for each value

constraint

  forall (i in I) (x[i] = sum(in I) (x[j] = i));

 

%Constraint variant 2: Global cardinality constraint

%include "globals.mzn";

%array[I] of I: index = array1d(I, [i | i in I]);

%constraint global_cardinality(x, index, x);

 

solve satisfy;

 

output [show(x)];

 

AMPL

 

param n integer = 4;

set I = {0..n};

 

var x{I} integer;

#var x{I} in I;

 

subject to con{i in I}:

x[i] = count{in I} (x[j] = i);

 

option solver gurobi;

option gurobi_options 'writeprob=x.lp outlev=1';

solve;

 

 

 

 

 

2 comments:

  1. CVX, and even CVXPY, are "primitive" compared to what they could be. http://ask.cvxr.com/t/ph-d-thesis-idea-next-generation-of-cvx-or-cvxpy-which-automatically-figues-out-reformulations-using-computer-algebra-techniques/10515

    ReplyDelete
  2. Interesting post! Coming from a CP background, I generally have a much easier timer reading models that use global constraints. While there are many different ones proposed in the literature, in most cases there is only a handful of global constraints that are used in most models.

    I typically compare the use of global constraints in modelling to programming with a standard library. Sure, I could implement a sorting routine every time a write a program that needs to sort values. But it is easier to read the code and be confident that it does what I think it does if the most common operations are named entities in the library.

    Sometimes the library code will be good and sometimes it will be bad. The same holds true for a global constraint, especially if it is decomposed for a solver that does not support it. I prefer using the global constraint first as a convenient short-hand, and only when a model needs to be fast will I look into if there is anything smarter that could be done. I think that systems like MiniZinc with the focus on translating into low-level constraints have a lot of potential in making this a structured process, such as automating trying out different decompositions and comparing for different solving technologies.

    For the problem in question, when I read the original description it naturally read as a global cardinality problem to me (counting occurrences of multiple values). Reading your original model, the best way for me to understand the constraints was to mentally map it to the corresponding global cardinality, with the slightly non-standard construction of the same variables doing both the counting and being counted. Given that, I really just wanted to try it out in MiniZinc to see if it made any difference in solving performance for a CP solver.

    I find it interesting just how much the modelling tradition one is used to influences how we think about models!

    ReplyDelete