Yet Another Math Programming Consultant: MINLP instead of indicator constraints?

In this post, I want to discuss indicator constraints and how to replace them with simple multiplications. As we shall see, this is a somewhat harebrained but still interesting idea.

Indicator constraints are implications of the form: \[\delta=0 \implies \text{linear constraint}\] or \[\delta=1 \implies \text{linear constraint}\] where $\delta \in \{0,1\}$ is a binary decision variable.

There are two aspects of indicator constraints:

Indicator constraints help with MIP models where otherwise we would use big-M constraints. This will help address the numerical issues that result from using big-M constraints. These include small (and sometimes not-so-small) solution values where you expect zero, and poor solver performance. With big-M constraints, we need to pay much attention to the size of the big-M constants. This is the algorithmic aspect.
Indicator constraints provide a convenient modeling construct. They form a useful abstraction that makes MIP modeling easier and more straightforward. This is the modeling aspect.

Another example where a solver feature has such a dual benefit (algorithmic and modeling) are SOS2 sets for piecewise linear functions.

The two advantages reinforce each other, making MIP modeling safer and easier. It is no surprise that indicator constraints are a popular modeling tool. There are some obvious limitations:

Only linear constraints are supported. I don't think there is any solver that has generalized indicator constraints to handle quadratic or other nonlinear constraints.
Only a single binary variable can be on the left of the implication. Forms that are not supported include: \[{\color{darkred}\delta}_i=1 \>\mathbf{and}\> {\color{darkred}\delta}_j=1 \implies {\color{darkred}z} \le {\color{darkred}x}_{i,j}\] and \[ {\color{darkred}z}\ge 3.5 \implies {\color{darkred}x} \le {\color{darkred}y}\]
Sometimes using old-fashioned big-M constraints gives better performance than using indicator constraints.
All in all, to me, indicator constraints feel a bit like a somewhat unfinished project. Only the easy cases are implemented. I may be a bit too harsh here.

Idea

Here, I propose replacing indicator constraints with simple multiplication. E.g., the indicator constraint \[{\color{darkred}\delta}=1 \implies {\color{darkred}x} \le {\color{darkred}y}\] can be stated as: \[{\color{darkred}x}\cdot {\color{darkred}\delta} \le {\color{darkred}y} \cdot {\color{darkred}\delta}\] Similarly, \[{\color{darkred}\delta}=0 \implies {\color{darkred}x} \le {\color{darkred}y}\] can be written as: \[{\color{darkred}x}\cdot (1-{\color{darkred}\delta}) \le {\color{darkred}y} \cdot (1-{\color{darkred}\delta})\] Would this actually work? Obvious advantages are: no big-M's needed and no new syntax to learn. Disadvantage: We are looking at non-linearities. We need to do some experiments to see how difficult the resulting MINLP models are.

Motivating example

Suppose we have $n$ locations. We want to select $k\lt n$ locations that are spread out a bit. We do this by maximizing the minimum distance between these $k$ locations. In the picture above, we have $n=50$, $k=10$. The minimum distance between the selected (orange) points is shown in red. All other orange points are farther apart.

The model is simple:

maximin model
\[\begin{align}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} && \forall i \lt j \| {\color{darkred}s}_i={\color{darkred}s}_j=1 \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

When using indicator constraints, the inequality could be written as: \[{\color{darkred}s}_i=1 \>\mathbf{and}\>{\color{darkred}s}_j=1 \implies {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j}\] Unfortunately, we can't use this directly. We would need to introduce binary variables ${\color{darkred}t}_{i,j} = {\color{darkred}s}_i \cdot {\color{darkred}s}_j$. This would require extra constraints, especially if we want to linearize this. This is a good example where using indicator constraints requires a bit of work.

model 1: MIP/indicators
\[\begin{align}\max\>&{\color{darkred}z}\\ &{\color{darkred}t}_{i,j}=1 \implies {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} && \forall i \lt j \\ &{\color{darkred}t}_{i,j} \ge{\color{darkred}s}_i +{\color{darkred}s}_j - 1 &&\forall i \lt j\\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i, {\color{darkred}t}_{i,j} \in \{0,1\} \end{align}\]

A standard linearization for ${\color{darkred}t}_{i,j} = {\color{darkred}s}_i \cdot {\color{darkred}s}_j$ is: \[\begin{align}& {\color{darkred}t}_{i,j} \le {\color{darkred}s}_i \\ & {\color{darkred}t}_{i,j} \le {\color{darkred}s}_j \\ & {\color{darkred}t}_{i,j} \ge {\color{darkred}s}_i + {\color{darkred}s}_j - 1 \end{align}\] We only need to keep the last inequality. The reason is that we just require \[{\color{darkred}s}_i = {\color{darkred}s}_j = 1 \implies {\color{darkred}t}_{i,j} =1\] and this is exactly what ${\color{darkred}t}_{i,j} \ge {\color{darkred}s}_i + {\color{darkred}s}_j - 1$ accomplishes.

Now we focus on alternatives for this indicator-based model. Instead of indicator constraints, we can also use just straightforward multiplications:

model 2: MINLP/left
\[\begin{align}\max\>&{\color{darkred}z}\\ &{\color{darkred}z}\cdot {\color{darkred}s}_i \cdot {\color{darkred}s}_j\le {\color{darkblue}{\mathit{dist}}}_{i,j} && \forall i\lt j \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

If one of the binary variables ${\color{darkred}s}_i$, ${\color{darkred}s}_j$ is zero, ${\color{darkred}z}$ will not be restricted. This is exactly what we would like to see when using indicator constraints. We (or the solver) may be reformulating this into: \[\begin{align} &{\color{darkred}z}\cdot {\color{darkred}t}_{i,j} \le {\color{darkblue}{\mathit{dist}}}_{i,j} \\ & {\color{darkred}t}_{i,j} \ge {\color{darkred}s}_i + {\color{darkred}s}_j - 1 \end{align}\] A slightly better, and more general approach is to multiply both sides by ${\color{darkred}s}_i \cdot {\color{darkred}s}_j$:

model 3: MINLP/both
\[\begin{align}\max\>&{\color{darkred}z}\\ &{\color{darkred}z}\cdot {\color{darkred}s}_i \cdot {\color{darkred}s}_j\le {\color{darkblue}{\mathit{dist}}}_{i,j}\cdot {\color{darkred}s}_i \cdot {\color{darkred}s}_j && \forall i\lt j \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

In these two MINLP models, we don't use special concepts like indicator constraints: we only use standard multiplications. Obviously, this must be a rather brain-dead idea. Given the current state of technology, these endogenous multiplications lead to somewhat complicated MINLPs that may not easy to solve. The structure is not recognized as an implication, something the solver could exploit. The second version (multiply both sides) may actually be faster as it is tighter for fractional values of ${\color{darkred}s}$. E.g. if ${\color{darkred}s}_i =0.5$ then we have ${\color{darkred}z} \le 4 {\color{darkblue}{\mathit{dist}}}_{i,j}$ in the previous model, but ${\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j}$ here. It is also more general: that approach always works. We can apply this blindly. The first version made the assumption that ${\color{darkblue}{\mathit{dist}}}_{i,j} \ge 0$; the second version makes no such assumptions.

Again, we can reformulate this into \[\begin{align} &{\color{darkred}z}\cdot {\color{darkred}t}_{i,j} \le {\color{darkblue}{\mathit{dist}}}_{i,j} \cdot {\color{darkred}t}_{i,j}\\ & {\color{darkred}t}_{i,j} \ge {\color{darkred}s}_i + {\color{darkred}s}_j - 1 \end{align}\]

In practice, we may use a big-M constraint, which yields a linear MIP model. We could try to linearize \[{\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} + {\color{darkblue}M} (1-{\color{darkred}s}_i \cdot{\color{darkred}s}_j)\] using the same linearization techniques discussed earlier. However, here is a simpler approach:

model 4: MIP/big-M
\[\begin{align}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} + {\color{darkblue}M} (1-{\color{darkred}s}_i) +{\color{darkblue}M} (1-{\color{darkred}s}_j)&& \forall i \lt j \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

We can tighten the big-M values by using a more specifically tailored ${\color{darkblue}M}_{i,j}$: \[\begin{align} &{\color{darkblue}{\mathit{maxdist}}} := \max_{i\lt j} {\color{darkblue}{\mathit{dist}}}_{i,j}\\ &{\color{darkblue}M}_{i,j} := {\color{darkblue}{\mathit{maxdist}}}- {\color{darkblue}{\mathit{dist}}}_{i,j}\end{align} \] It is always a good idea to spend some time to derive the smallest big-M constants. These smaller ${\color{darkblue}M}_{i,j}$'s were used in the experiments.

Just for completeness, I also want to mention SOS1 (Special Ordered Sets of Type 1). This can be used instead of binary variables as it does not need big-M's. If good bounds for the big-M's are available, binary variables are usually better, but if you can't get good values for the big-M's, an SOS1 approach is an alternative. The variables that are part of a SOS1 set, have the following property: only one of them can be nonzero. We could reformulate \[{\color{darkred}t}_{i,j}=1 \implies {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j}\] as \[\begin{align} & {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} + {\color{darkred}v}_{i,j} \\ & ({\color{darkred}t}_{i,j},{\color{darkred}v}_{i,j}) \in \mathbf{SOS1} \\ & {\color{darkred}v}_{i,j} \ge 0\end{align}\] So we have a bunch of SOS1 sets with just two members. The complete model looks like:

model 5: MIP/sos1
\[\begin{align}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} +{\color{darkred}v}_{i,j}&& \forall i \lt j \\ &{\color{darkred}t}_{i,j} \ge{\color{darkred}s}_i +{\color{darkred}s}_j - 1 &&\forall i \lt j \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\& ({\color{darkred}t}_{i,j},{\color{darkred}v}_{i,j}) \in \mathbf{SOS1}\\&{\color{darkred}t}_{i,j} \in [0,1] \\&{\color{darkred}v}_{i,j} \ge 0 \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

model 5: MIP/sos1

\[\begin{align}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} +{\color{darkred}v}_{i,j}&& \forall i \lt j \\ &{\color{darkred}t}_{i,j} \ge{\color{darkred}s}_i +{\color{darkred}s}_j - 1 &&\forall i \lt j \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\& ({\color{darkred}t}_{i,j},{\color{darkred}v}_{i,j}) \in \mathbf{SOS1}\\&{\color{darkred}t}_{i,j} \in [0,1] \\&{\color{darkred}v}_{i,j} \ge 0 \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

Here are results with Baron (MINLP) and Cplex (MIP).

----    171 PARAMETER size  

n 50.000,    k 10.000


----    171 PARAMETER results  

                          vars        equs         obj        time       nodes

model 1.mip/indic     1276.000    2451.000      30.431       1.344    6520.000
model 2.minlp/left      51.000    1226.000      30.431       0.920       1.000
model 3.minlp/both      51.000    1226.000      30.431       3.220       1.000
model 4.mip/bigM        51.000    1226.000      30.431       0.406    7372.000
model 5.mip/sos1      2501.000    2451.000      30.431      75.125   87277.000

Notes:

Cplex with indicator variables is slower than with binary variables and bigM's. Often the node count would agree with this, but here it is reversed.
The MINLP models are not doing so poorly here! Sometimes they are even beating Cplex. The node counts are surprising. I expected model 3 to be faster than model model 2, but with a node count of 1, my argument for that is not very important.
The number of ways we can select $k=10$ out of $n=50$ is \[\binom{n}{k}=\binom{50}{10} \approx 10.2 \text{ billion}\]
The SOS1 model is doing very poorly. For more combinatorial problems, binary variables are much preferred: the solver can do all kinds of tricks with those, but not with SOS1 sets.

I also ran a larger problem:

----    171 PARAMETER size  

n 100.000,    k  15.000


----    171 PARAMETER results  

                          vars        equs         obj        time       nodes      status

model 1.mip/indic     5051.000    9901.000      24.589      60.484   46932.000
model 2.minlp/left     101.000    4951.000      24.589      25.290       1.000
model 3.minlp/both     101.000    4951.000      24.589      18.370       1.000
model 4.mip/bigM       101.000    4951.000      24.589       1.734    7819.000
model 5.mip/sos1     10001.000    9901.000      24.589    1000.375   61474.000   TimeLimit

Notes:

Cplex MIP with indicators is the slowest here, by some large margin.
So, the MINLP models do quite well. Again, needing just one node.
But the winner is again the MIP with small big M's.

The log of Baron looks like:

===========================================================================
 Doing local search
 Preprocessing found feasible solution with value 11.3746
 Solving bounding LP
 Starting multi-start local search
 Preprocessing found feasible solution with value 11.5341
 Preprocessing found feasible solution with value 11.9417
 Preprocessing found feasible solution with value 14.7888
 Done with local search
===========================================================================
  Iteration       Time (s)     Mem   Lower bound     Upper bound   Progress
*         1          18.06   267MB     24.5888         119.859        0.93%
          1          18.08   267MB     24.5888         24.5888      100.00%

                         *** Normal completion ***            

 Wall clock time:                    18.24
 Total CPU time used:                18.08

 Total no. of BaR iterations:       1
 Best solution found at node:       1
 Max. no. of nodes in memory:       1
 
 All done
===========================================================================

This confirms the need for just a single node.

Conclusion

We can implement indicator constraints simply by multiplying both sides of the constraint by the Boolean variables involved. When they are zero, the constraint vanishes.

Depending on the solver, the above MINLP models solve rather quickly. Both Baron and Antigone do a great job on this. All in all, they do worse than the best linear models and better than the worst. Somewhat in the middle.

It is still a really silly idea to use nonlinear modeling to formulate an indicator constraint. But it behaved much better than I expected. There is much progress in MINLP solvers, so we may need to revisit this in the future.

I sometimes use MINLP models when developing and prototyping models to get the logic right and produce a correct model. Only after that, I work on making things faster and look into linearizations. MINLP offers convenient and expressive modeling. This was a bit of a strange post, but I think the underlying idea is interesting: use MINLP as a modeling device.

References

Günlük, O., Linderoth, J. Perspective reformulations of mixed integer nonlinear programs with indicator variables. Math. Program. 124, 183–205 (2010)

GAMS Models

$onText

Find dispersed k out-of n locations by maximizing the minimum distance.

Experiment with alternatives for indicator constraints.

See:

https://yetanothermathprogrammingconsultant.blogspot.com/2026/06/minlp-instead-of-indicator-constraints.html

$offText

option seed = 12345;

option minlp=baron,mip=cplex;

* select k out of n points

$set n 50

$set k 10

*$set n 100

*$set k 15

* 0: no plot

$set htmlplot 1

*-----------------------------------------------------------------

* data

*-----------------------------------------------------------------

Scalars

n 'total number of locations' /%n%/

k 'number of locations to choose' /%k%/

;

set

i 'possibe locations' /location1*location%n%/

c 'coordinates' /x,y/

;

parameter

p(i,c) 'possible locations'

;

p(i,c) = uniform(0,100);

display p;

*-----------------------------------------------------------------

* derived data: distances

*-----------------------------------------------------------------

alias(i,j);

set lt(i,j) 'upper triangular part';

lt(i,j) = ord(i)<ord(j);

parameter dist(i,j) 'distance between locations';

dist(lt(i,j)) = sqrt(sum(c, sqr(p(i,c)-p(j,c))));

display dist;

scalars

mind 'minimum distance'

maxd 'maximum distance'

;

mind = smin(lt(i,j),dist(i,j));

maxd = smax(lt(i,j),dist(i,j));

display mind,maxd;

*-----------------------------------------------------------------

* models

*-----------------------------------------------------------------

parameter M(i,j) 'big-M';

M(lt(i,j)) = maxd-dist(i,j);

set s1 'for sos1' /sos1-1,sos1-2/;

binary variables

s(i) 'location is selected'

t(i,j) 's(i)*s(j)'

;

variable z 'objective: max min dist';

sos1 variable slack(i,j,s1);

Equations

count 'select n locations'

mindist1 'using indicator constraints'

mindist2 'min distance (bound): MINLP formulation of indicator constraint'

mindist3 'min distance (bound): MINLP formulation of indicator constraint'

mindist4 'min distance (bound): Linearized big-M formulation'

tbound 's(i)=s(j)=1 => t(i,j)=1'

sosbound "s(i)=s(j)=1 => slack(i,j,'sos1-1')=1"

sos_1

sos_2

;

mindist1(lt(i,j)).. z =l= dist(i,j);

mindist2(lt(i,j)).. z*s(i)*s(j) =l= dist(i,j);

mindist3(lt(i,j)).. z*s(i)*s(j) =l= dist(i,j)*s(i)*s(j);

mindist4(lt(i,j)).. z =l= dist(i,j) + M(i,j)*(1-s(i)) + M(i,j)*(1-s(j));

tbound(lt(i,j)).. t(i,j) =g= s(i)+s(j)-1;

count.. sum(i, s(i)) =e= k;

slack.up(lt,'sos1-1') = 1;

sos_1(lt(i,j)).. slack(i,j,'sos1-1') =g= s(i)+s(j)-1;

sos_2(lt(i,j)).. z =l= dist(i,j) + slack(i,j,'sos1-2');

* gams indicator constraints are nuts

$onecho > cplex.opt

indic mindist1(i,j)$t(i,j) 1

$offecho

* bounds for the benefit of Baron

z.lo = 0;

z.up = maxd;

model m1 /mindist1, tbound, count/;

model m2 /mindist2, count/;

model m3 /mindist3, count/;

model m4 /mindist4, count/;

model m5 /sos_1,sos_2, count/;

option reslim = 1000;

m1.optfile=1;

solve m1 maximizing z using mip;

s.l(i) = 0;

solve m2 maximizing z using minlp;

s.l(i) = 0;

solve m3 maximizing z using minlp;

s.l(i) = 0;

solve m4 maximizing z using mip;

s.l(i) = 0;

solve m5 maximizing z using mip;

*-----------------------------------------------------------------

* reporting

*-----------------------------------------------------------------

parameter d(i,j) 'distance of used points';

d(lt(i,j))$(s.l(i)>0.5 and s.l(j)>0.5) = dist(i,j);

display z.l,s.l,d;

scalar xmind 'z.l is not exact';

xmind = smin((i,j)$d(i,j),d(i,j));

set mindist_id(i,j) 'min distance nodes';

mindist_id(i,j)$d(i,j) = d(i,j) = xmind;

display mindist_id;

parameter results,size;

size('n') = n;

size('k') = k;

acronym TimeLimit;

$macro modelresult(name1,name2,modelid) \

results(name1,name2,'vars') = modelid.numvar; \

results(name1,name2,'equs') = modelid.numequ; \

results(name1,name2,'obj') = modelid.objval; \

results(name1,name2,'time') = modelid.resusd; \

results(name1,name2,'nodes') = modelid.nodusd; \

results(name1,name2,'status')$(modelid.solvestat=3) = timelimit;

modelresult('model 1','mip/indic',m1)

modelresult('model 2','minlp/left',m2)

modelresult('model 3','minlp/both',m3)

modelresult('model 4','mip/bigM',m4)

modelresult('model 5','mip/sos1',m5)

display size,results;

*---------------------------------------------------------------------

* visualization

* takes last solution

*---------------------------------------------------------------------

$set html plot.html

$set data data.js

$if %htmlplot%==0 $goto skipplot

file fdata /%data%/; put fdata;

* points

put "px=["/;

loop(i$(s.l(i)<0.5),

put p(i,'x'):0:4,","/;

);

put "]"/;

put "py=["/;

loop(i$(s.l(i)<0.5),

put p(i,'y'):0:4,","/;

);

put "]"/;

put "sx=["/;

loop(i$(s.l(i)>0.5),

put p(i,'x'):0:4,","/;

);

put "]"/;

put "sy=["/;

loop(i$(s.l(i)>0.5),

put p(i,'y'):0:4,","/;

);

put "]"/;

$onecho > %html%

<html>

<p>

var data1 = {

x: px,

y: py,

size: 2,

mode: 'markers',

type: 'scatter',

color: 'blue',

name: 'locations'

};

var data2 = {

x: sx,

y: sy,

size: 2,

mode: 'markers',

type: 'scatter',

color: 'orange',

name: 'selected locations'

};

n = sy.length;

shortest = 1.0e6

ishort = -1

jshort = -1

for (i=0; i<n; ++i) {

for (j=0; j<i; ++j) {

d = (sx[i]-sx[j])**2 + (sy[i]-sy[j])**2

if (d<shortest) {

ishort=i;

jshort=j;

shortest=d;

}

shapes = []

for (i=0; i<n; ++i) {

for (j=0; j<i; ++j) {

col = 'orange';

w = 1;

if ((i==ishort) && (j==jshort)) {

col = 'red';

w = 2;

}

sh = {

type : 'line',

x0 : sx[i],

y0 : sy[i],

x1 : sx[j],

y1 : sy[j],

line : {

color: col,

width: w

}

shapes.push(sh);

}

var layout1 = {

autosize: false,

width: 650,

height: 600,

showlegend: true,

title: {text:'maximin distance',font: {size: 24}},

shapes: shapes,

}

var trc1 = [data1,data2];

var options1 = {staticPlot: true, displayModeBar: false, responsive: false};

Plotly.newPlot('plotDiv0', trc1, layout1, options1);

document.getElementById("expl").innerHTML = "We find k="+sx.length+

" points out of n="+(sx.length+px.length)+

". The length of the shortest line between selected points is maximized. "+

"This shortest line is colored red and has length "+

Math.sqrt(shortest).toFixed(3) + ".";

</script>

$offecho

executetool 'win32.ShellExecute "%html%"';

$label skipplot

Yet Another Math Programming Consultant

Tuesday, June 2, 2026

MINLP instead of indicator constraints?

Idea

Motivating example

Conclusion

References

1 comment: