Tuesday, June 2, 2026

MINLP instead of indicator constraints?

In this post, I want to discuss indicator constraints and how to replace them with simple multiplications. As we shall see, this is a somewhat harebrained but still interesting idea. 

Indicator constraints are implications of the form: \[\delta=0 \implies \text{linear constraint}\] or \[\delta=1 \implies \text{linear constraint}\] where \(\delta \in \{0,1\}\) is a binary decision variable.

There are two aspects of indicator constraints: 
  • Indicator constraints help with MIP models where otherwise we would use big-M constraints. This will help address the numerical issues that result from using big-M constraints. These include small (and sometimes not-so-small) solution values where you expect zero, and poor solver performance. With big-M constraints, we need to pay much attention to the size of the big-M constants.
  • Indicator constraints provide a convenient modeling construct. They form a useful abstraction that makes MIP modeling easier and more straightforward.
The two advantages reinforce each other, making MIP modeling safer and easier. It is no surprise that indicator constraints are a popular modeling tool. There are some obvious limitations:
  • Only linear constraints are supported. I don't think there is any solver that has generalized indicator constraints to handle quadratic or other nonlinear constraints. 
  • Only a single binary variable can be on the left of the implication. Forms that are not supported include: \[{\color{darkred}\delta}_i=1 \>\mathbf{and}\> {\color{darkred}\delta}_j=1 \implies {\color{darkred}z} \le {\color{darkred}x}_{i,j}\] and \[ {\color{darkred}z}\ge 3.5 \implies {\color{darkred}x} \le {\color{darkred}y}\]  
  • Sometimes using old-fashioned big-M constraints gives better performance than using indicator constraints. 

Idea


Here, I want to propose that we can replace indicator constraints with simple multiplication. E.g., the indicator constraint \[{\color{darkred}\delta}=1 \implies {\color{darkred}x} \le {\color{darkred}y}\] can be stated as: \[{\color{darkred}x}\cdot {\color{darkred}\delta} \le {\color{darkred}y} \cdot {\color{darkred}\delta}\] Similarly, \[{\color{darkred}\delta}=0 \implies {\color{darkred}x} \le {\color{darkred}y}\] can be written as: \[{\color{darkred}x}\cdot (1-{\color{darkred}\delta}) \le {\color{darkred}y} \cdot (1-{\color{darkred}\delta})\] Would this actually work? Obvious advantages are: no big-M's needed and no new syntax to learn. Disadvantage: We are looking at non-linearities. We need to do some experiments to see how difficult the resulting MINLP models are.
 

Motivating example



Suppose we have \(n\) locations. We want to select \(k\lt n\) locations that are spread out a bit. We do this by maximizing the minimum distance between these \(k\) locations. In the picture above, we have \(n=50\), \(k=10\). The minimum distance between the selected (orange) points is shown in red. All other orange points are farther apart.

The model is simple:

maximin model
\[\begin{align}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} && \forall i \lt j | {\color{darkred}s}_i={\color{darkred}s}_j=1 \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

When using indicator constraints, the inequality could be written as: \[{\color{darkred}s}_i=1 \>\mathbf{and}\>{\color{darkred}s}_j=1 \implies {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j}\] Unfortunately, we can't use this directly. We would need to introduce binary variables \({\color{darkred}t}_{i,j} = {\color{darkred}s}_i \cdot {\color{darkred}s}_j\). This would require extra constraints, especially if we want to linearize this. This is a good example where using indicator constraints requires a bit of work. 

model 1: MIP/indicators
\[\begin{align}\max\>&{\color{darkred}z}\\ &{\color{darkred}t}_{i,j}=1 \implies {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} && \forall i \lt j \\ &{\color{darkred}t}_{i,j} \ge{\color{darkred}s}_i +{\color{darkred}s}_j - 1 &&\forall i \lt j\\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i, {\color{darkred}t}_{i,j} \in \{0,1\} \end{align}\]

A standard linearization for \({\color{darkred}t}_{i,j} = {\color{darkred}s}_i \cdot {\color{darkred}s}_j\) is: \[\begin{align}& {\color{darkred}t}_{i,j} \le {\color{darkred}s}_i  \\ & {\color{darkred}t}_{i,j} \le {\color{darkred}s}_j   \\ & {\color{darkred}t}_{i,j} \ge {\color{darkred}s}_i + {\color{darkred}s}_j - 1  \end{align}\] We only need to keep the last inequality. The reason is that we just require \[{\color{darkred}s}_i = {\color{darkred}s}_j = 1 \implies {\color{darkred}t}_{i,j} =1\] and this is exactly what \({\color{darkred}t}_{i,j} \ge {\color{darkred}s}_i + {\color{darkred}s}_j - 1\) accomplishes.

Now we focus on alternatives for this indicator-based model. Instead of indicator constraints, we can also use just straightforward multiplications: 


model 2: MINLP/left
\[\begin{align}\max\>&{\color{darkred}z}\\ &{\color{darkred}z}\cdot {\color{darkred}s}_i \cdot {\color{darkred}s}_j\le {\color{darkblue}{\mathit{dist}}}_{i,j} && \forall i\lt j \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

If one of the binary variables \({\color{darkred}s}_i\), \({\color{darkred}s}_j\) is zero, \({\color{darkred}z}\) will not be restricted. This is exactly what we would like to see when using indicator constraints. A slightly better, and more general approach is to multiply both sides by \({\color{darkred}s}_i \cdot {\color{darkred}s}_j\):

model 3: MINLP/both
\[\begin{align}\max\>&{\color{darkred}z}\\ &{\color{darkred}z}\cdot {\color{darkred}s}_i \cdot {\color{darkred}s}_j\le {\color{darkblue}{\mathit{dist}}}_{i,j}\cdot {\color{darkred}s}_i \cdot {\color{darkred}s}_j && \forall i\lt j \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

In these two MINLP models, we don't use special concepts like indicator constraints: we only use standard multiplications. Obviously, this must be a rather brain-dead idea. Given the current state of technology, these endogenous multiplications lead to somewhat complicated MINLPs that may not easy to solve. The structure is not recognized as an implication, something the solver could exploit. The second version (multiply both sides) may actually be faster as it is tighter for fractional values of \({\color{darkred}s}\). It is also more general: that approach always works. We can apply this blindly. The first version made the assumption that \({\color{darkblue}{\mathit{dist}}}_{i,j} \ge 0\); the second version makes no such assumptions.  

In practice, we may use a big-M constraint, which yields a linear MIP model:

model 4: MIP/big-M
\[\begin{align}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le {\color{darkblue}{\mathit{dist}}}_{i,j} + {\color{darkblue}M} (1-{\color{darkred}s}_i) +{\color{darkblue}M} (1-{\color{darkred}s}_j)&& \forall i \lt j  \\ & \sum_i {\color{darkred}s}_i = {\color{darkblue}k} \\ & {\color{darkred}s}_i \in \{0,1\} \end{align}\]

We can tighten the big-M values by using a more specifically tailored \({\color{darkblue}M}_{i,j}\): \[\begin{align} &{\color{darkblue}{\mathit{maxdist}}} := \max_{i\lt j} {\color{darkblue}{\mathit{dist}}}_{i,j}\\ &{\color{darkblue}M}_{i,j} :=  {\color{darkblue}{\mathit{maxdist}}}- {\color{darkblue}{\mathit{dist}}}_{i,j}\end{align} \] It is always a good idea to spend some time to derive the smallest big-M constants.

Here are results with Baron (MINLP) and Cplex (MIP).

----    147 PARAMETER size  

n 50.000,    k 10.000


----    147 PARAMETER results  

                          vars        equs         obj        time       nodes

model 1.mip/indic     1276.000    2451.000      30.431       1.110    6520.000
model 2.minlp/left      51.000    1226.000      30.431       0.890       1.000
model 3.minlp/both      51.000    1226.000      30.431       3.180       1.000
model 4.mip/bigM        51.000    1226.000      30.431       0.391    7372.000

Notes:
  • Cplex with indicator variables is slower than with binary variables and bigM's. Often the node count would agree with this, but here it is reversed.
  • The MINLP models are not doing so poorly here! Sometimes they are even beating Cplex. The node counts are surprising. I expected model 3 to be faster than model model 2, but with a node count of 1, my argument for that is not very important.
  • The number of ways we can select \(k=10\) out of \(n=50\) is \[\binom{n}{k}=\binom{50}{10} \approx 10.2 \text{ billion}\] 

I also ran a larger problem:

----    147 PARAMETER size  

n 100.000,    k  15.000


----    147 PARAMETER results  

                          vars        equs         obj        time       nodes

model 1.mip/indic     5051.000    9901.000      24.589      59.125   46932.000
model 2.minlp/left     101.000    4951.000      24.589      24.090       1.000
model 3.minlp/both     101.000    4951.000      24.589      18.230       1.000
model 4.mip/bigM       101.000    4951.000      24.589       1.718    7819.000


Notes:
  • Cplex MIP with indicators is the slowest here, by some large margin.
  • So, the MINLP models do quite well. Again, needing just one node.
  • But the winner is again the MIP with small big M's.
The log of Baron looks like:

===========================================================================
 Doing local search
 Preprocessing found feasible solution with value 11.3746
 Solving bounding LP
 Starting multi-start local search
 Preprocessing found feasible solution with value 11.5341
 Preprocessing found feasible solution with value 11.9417
 Preprocessing found feasible solution with value 14.7888
 Done with local search
===========================================================================
  Iteration       Time (s)     Mem   Lower bound     Upper bound   Progress
*         1          18.06   267MB     24.5888         119.859        0.93%
          1          18.08   267MB     24.5888         24.5888      100.00%

                         *** Normal completion ***            

 Wall clock time:                    18.24
 Total CPU time used:                18.08

 Total no. of BaR iterations:       1
 Best solution found at node:       1
 Max. no. of nodes in memory:       1
 
 All done
===========================================================================

This confirms the need for just a single node.

Conclusion


We can implement indicator constraints simply by multiplying both sides of the constraint by the Boolean variables involved. When they are zero, the constraint vanishes. 

Depending on the solver, the above MINLP models solve rather quickly. Both Baron and Antigone do a great job on this. 

It is still a somewhat silly idea to use nonlinear modeling to formulate an indicator constraint. But it behaved much better than I expected. There is much progress in MINLP solvers, so we may need to revisit this in the future.


References


  1. Günlük, O., Linderoth, J. Perspective reformulations of mixed integer nonlinear programs with indicator variables. Math. Program. 124, 183–205 (2010)


GAMS Models

$onText

 

   Find dispersed k out-of n locations by maximizing the minimum distance.

  

   Experiment with alternatives for indicator constraints.

 

   See:

   https://yetanothermathprogrammingconsultant.blogspot.com/2026/06/minlp-instead-of-indicator-constraints.html

 

$offText

 

 

option seed = 12345;

option minlp=baron,mip=cplex;

 

* select k out of n points

*$set n 50

*$set k 10

$set n 100

$set k 15

 

* 0: no plot

$set htmlplot 1

 

*-----------------------------------------------------------------

* data

*-----------------------------------------------------------------

 

Scalars

  n 'total number of locations' /%n%/

  k 'number of locations to choose' /%k%/

;

 

set

    i 'possibe locations' /location1*location%n%/

    c 'coordinates' /x,y/

;

 

parameter

    p(i,c) 'possible locations'

;

p(i,c) = uniform(0,100);

display p;

 

*-----------------------------------------------------------------

* derived data: distances

*-----------------------------------------------------------------

 

alias(i,j);

set lt(i,j) 'upper triangular part';

lt(i,j) = ord(i)<ord(j);

 

parameter dist(i,j) 'distance between locations';

dist(lt(i,j)) = sqrt(sum(c, sqr(p(i,c)-p(j,c))));

display dist;

 

scalars

   mind 'minimum distance'

   maxd 'maximum distance'

;

mind = smin(lt(i,j),dist(i,j));

maxd = smax(lt(i,j),dist(i,j));

display mind,maxd;

 

*-----------------------------------------------------------------

* models

*-----------------------------------------------------------------

 

parameter M(i,j) 'big-M';

M(lt(i,j)) = maxd-dist(i,j); 

 

binary variables

    s(i)   'location is selected'

    t(i,j) 's(i)*s(j)'

;

variable z 'objective: max min dist';

 

Equations

    count     'select n locations'

    mindist1  'using indicator constraints'

    mindist2  'min distance (bound): MINLP formulation of indicator constraint'

    mindist3  'min distance (bound): MINLP formulation of indicator constraint'

    mindist4  'min distance (bound): Linearized big-M formulation'

    tbound    's(i)=s(j)=1 => t(i,j)=1'

;

 

mindist1(lt(i,j))..  z =l= dist(i,j);

mindist2(lt(i,j))..  z*s(i)*s(j) =l= dist(i,j);

mindist3(lt(i,j))..  z*s(i)*s(j) =l= dist(i,j)*s(i)*s(j);

mindist4(lt(i,j))..  z =l= dist(i,j) + M(i,j)*(1-s(i)) + M(i,j)*(1-s(j));

tbound(lt(i,j))..    t(i,j) =g= s(i)+s(j)-1;

count.. sum(i, s(i)) =e= k;

 

* gams indicator constraints are nuts

$onecho > cplex.opt

indic mindist1(i,j)$t(i,j) 1

$offecho

 

* bounds for the benefit of Baron

z.lo = 0;

z.up = maxd;

 

model m1 /mindist1, tbound, count/;

model m2 /mindist2, count/;

model m3 /mindist3, count/;

model m4 /mindist4, count/;

 

m1.optfile=1;

solve m1 maximizing z using mip;

s.l(i) = 0;

solve m2 maximizing z using minlp;

s.l(i) = 0;

solve m3 maximizing z using minlp;

s.l(i) = 0;

solve m4 maximizing z using mip;

 

*-----------------------------------------------------------------

* reporting

*-----------------------------------------------------------------

 

parameter d(i,j) 'distance of used points';

d(lt(i,j))$(s.l(i)>0.5 and s.l(j)>0.5) = dist(i,j);

display z.l,s.l,d;

 

scalar xmind 'z.l is not exact';

xmind = smin((i,j)$d(i,j),d(i,j));

 

set mindist_id(i,j) 'min distance nodes';

mindist_id(i,j)$d(i,j) = d(i,j) = xmind;

display mindist_id;

 

parameter results,size;

size('n') = n;

size('k') = k;

 

$macro modelresult(name1,name2,modelid) \

   results(name1,name2,'vars')   = modelid.numvar; \

   results(name1,name2,'equs')   = modelid.numequ; \

   results(name1,name2,'obj')    = modelid.objval; \

   results(name1,name2,'time')   = modelid.resusd; \

   results(name1,name2,'nodes')  = modelid.nodusd;

  

modelresult('model 1','mip/indic',m1)

modelresult('model 2','minlp/left',m2)

modelresult('model 3','minlp/both',m3)

modelresult('model 4','mip/bigM',m4)

 

display size,results;

 

*---------------------------------------------------------------------

* visualization

*---------------------------------------------------------------------

 

$set html  plot.html

$set data  data.js

 

$if %htmlplot%==0 $goto skipplot

 

 

 

file fdata /%data%/; put fdata;

 

*  points

put "px=["/;

loop(i$(s.l(i)<0.5),

  put p(i,'x'):0:4,","/;

);

put "]"/;

put "py=["/;

loop(i$(s.l(i)<0.5),

  put p(i,'y'):0:4,","/;

);

put "]"/;

put "sx=["/;

loop(i$(s.l(i)>0.5),

  put p(i,'x'):0:4,","/;

);

put "]"/;

put "sy=["/;

loop(i$(s.l(i)>0.5),

  put p(i,'y'):0:4,","/;

);

put "]"/;

 

$onecho > %html%

<html>

<script src="https://cdn.plot.ly/plotly-3.4.0.min.js" charset="utf-8"></script>

<script src="%data%" charset="utf-8"></script>

 

<div id="plotDiv0" style="width: 800px; height: 600px;"></div>

 

<p id="expl">

<p>

 

<script>

 

 

var data1 = {

  x: px,

  y: py,

  size: 2,

  mode: 'markers',

  type: 'scatter',

  color: 'blue',

  name: 'locations'

};

var data2 = {

  x: sx,

  y: sy,

  size: 2,

  mode: 'markers',

  type: 'scatter',

  color: 'orange',

  name: 'selected locations'

};

 

 

n = sy.length;

 

shortest = 1.0e6

ishort = -1

jshort = -1

 

for (i=0; i<n; ++i) {

   for (j=0; j<i; ++j) {

      d = (sx[i]-sx[j])**2 + (sy[i]-sy[j])**2

      if (d<shortest) {

         ishort=i;

         jshort=j;

         shortest=d;

      }

   }

}

 

shapes = []

 

for (i=0; i<n; ++i) {

   for (j=0; j<i; ++j) {

       col = 'orange';

       w = 1;

       if ((i==ishort) && (j==jshort)) {

          col = 'red';

          w = 2;

       }

       sh = {

         type : 'line',

         x0 : sx[i],

         y0 : sy[i],

         x1 : sx[j],

         y1 : sy[j],

         line : {

            color: col,

            width: w

         }

       }

       shapes.push(sh);

   }

}

 

var layout1 = {

  autosize: false,

  width: 650,

  height: 600,

  showlegend: true,

  title: {text:'maximin distance',font: {size: 24}},

  shapes: shapes,

}

 

 

 

var trc1 = [data1,data2];

var options1 = {staticPlot: true, displayModeBar: false, responsive: false};

Plotly.newPlot('plotDiv0', trc1, layout1, options1);

 

 

document.getElementById("expl").innerHTML = "We find k="+sx.length+

  " points out of n="+(sx.length+px.length)+

  ". The length of the shortest line between selected points is maximized. "+

  "This shortest line is colored red and has length "+

  Math.sqrt(shortest).toFixed(3) + ".";

 

 

</script>

$offecho

 

executetool 'win32.ShellExecute "%html%"';

 

$label skipplot

 

 

No comments:

Post a Comment