Tuesday, January 30, 2024

One nonzero in set of free variables

In [1] the following question is posed:

I have free variables \(\color{darkred}x_i\). How can I impose the constraint that at least one of the variables is nonzero: \(\color{darkred}x_i\ne 0\).


Observations and formulations


  • \(\color{darkred}x_i\ne 0\) is somewhat difficult. We need to write this as \[\begin{align}&\color{darkred}x_i\le -\color{darkblue}\epsilon \\ & \textbf{or} \\ &  \color{darkred}x_i\ge \color{darkblue}\epsilon\end{align}\] Here \(\color{darkblue}\epsilon\gt 0\) is a small constant. This "or"-type of constraint needs a binary variable (or something similar). It can't be modeled in a pure LP setting. (We have some non-convexity here).
  • The constant \(\color{darkblue}\epsilon\) should not be too small, as the solver will use some feasibility tolerances. These tolerances are, in general, absolute (i.e. not relative). As the solver typically applies some scaling, this tolerance may be even larger than you think (in terms of the original constraints). And who knows what the impact of preprocessing is.  I sometimes see modelers using using something like \(\color{darkblue}\epsilon = \textrm{1e-6}\). This looks too small for me. I typically use something like 0.001 or 0.0001. Of course, this depends on the order of magnitudes of the values that \(\color{darkred}x_i\) can assume. 
  • We can split each \(\color{darkred}x_i\) into three regions: \(\lt 0, = 0, \gt 0\). An intuitive way to write this down is using indicator constraints:  \[\begin{align}&\color{darkred}\delta_{i,1}=1 \implies \color{darkred}x_i \le -\color{darkblue}\epsilon\\ & \color{darkred}\delta_{i,2}=1 \implies -\color{darkblue}\epsilon \le \color{darkred}x_i \le \color{darkblue}\epsilon \\ &\color{darkred}\delta_{i,3}=1 \implies \color{darkred}x_i \ge \color{darkblue}\epsilon \\ & \sum_r \color{darkred}\delta_{i,r} = 1 &&\forall i \\ & \sum_i (\color{darkred}\delta_{i,1}+\color{darkred}\delta_{i,3})\ge 1\\ & \color{darkred}\delta_{i,r} \in \{0,1\} \end{align} \] The middle implication usually needs to be implemented as two constraints. Note that at the points \(\color{darkred}x_i = \pm \color{darkblue}\epsilon\) we have a bit of ambiguity. In practice, that is not a bad thing: let the solver determine the most profitable region for those cases. The last constraint can also be stated as \[\sum_i \color{darkred}\delta_{i,2}\le \color{darkblue}n-1\] where \(\color{darkblue}n\) is the size of set \(i \in I\), i.e., \(\color{darkblue}n=|I|\).
  • Unfortunately, not all modeling tools and solvers support indicator constraints. They really should, as this is a useful modeling concept. If we have good, tight bounds \(\color{darkred}x_i \in [\color{darkblue}L_i,\color{darkblue}U_i]\), with \(\color{darkblue}L_i \lt 0\), then we can use binary variables. Here is a possible formulation:  \[\begin{align}&\color{darkred}x_i \le -\color{darkblue}\epsilon \cdot \color{darkred}\delta_{i,1} + \color{darkblue}U_i  \cdot (1-\color{darkred}\delta_{i,1})\\ & \color{darkred}x_i \ge -\color{darkblue}\epsilon \cdot \color{darkred}\delta_{i,2} + \color{darkblue}L_i  \cdot (1-\color{darkred}\delta_{i,2})\\ & \color{darkred}x_i \le \color{darkblue}\epsilon \cdot \color{darkred}\delta_{i,2}  + \color{darkblue}U_i  \cdot (1-\color{darkred}\delta_{i,2})  \\ & \color{darkred}x_i \ge \color{darkblue}\epsilon \cdot \color{darkred}\delta_{i,3} +  \color{darkblue}L_i\cdot (1-\color{darkred}\delta_{i,3}) \\ & \sum_r \color{darkred}\delta_{i,r} = 1 &&\forall i\\ & \sum_i (\color{darkred}\delta_{i,1}+\color{darkred}\delta_{i,3})\ge 1 \\ & \color{darkred}\delta_{i,r} \in \{0,1\} \end{align} \]
  • If we don't have good bounds on \(\color{darkred}x_i\) and no indicator constraints, we can use so-called Special Ordered Sets of Type 1 (a.k.a. SOS1 sets). A high-level model can look like: \[\begin{align}& |\color{darkred}x_i| \ge \color{darkblue}\epsilon \cdot \color{darkred}\gamma_i \\ & \sum_i \color{darkred}\gamma_i \ge 1 & \\ & \color{darkred}\gamma_i  \in \{0,1\} \end{align}\] The non-convex absolute value constraint can be modeled as: \[\begin{align}&\color{darkred}y_i^+ - \color{darkred}y_i^- = \color{darkred}x_i \\ &\color{darkred}y_i^+ + \color{darkred}y_i^- \ge \color{darkblue}\epsilon \cdot \color{darkred}\gamma_i \\ & \color{darkred}y_i^+, \color{darkred}y_i^- \ge 0 \\ & \color{darkred}y_i^+, \color{darkred}y_i^- \in \textbf{SOS1}\end{align}\] I.e., we have \(\color{darkblue}n=|I|\) SOS1 sets with two members each. The SOS1 sets ensures that only one of \(\color{darkred}y_i^+, \color{darkred}y_i^-\) can be nonzero. This construct does the same thing as a complementarity condition \[\color{darkred}y_i^+ \cdot \color{darkred}y_i^- = 0\] We can also simulate indicator constraints directly using SOS1 sets.
  • In [1], I suggested somewhat of a hack. I should not have mentioned that. Let me explain. Instead of checking each \(\color{darkred}x_i\), we can try to do \[\color{darkred}y = \sum_i \color{darkred}x_i\] Now we just require: \(\color{darkred}y\ne 0\). That will force some \(\color{darkred}x_i \ne 0\). Note that the reverse is not true: if  \(\color{darkred}y=0\), we not necessarily have that all \(\color{darkred}x_i = 0\). E.g. consider: \[\begin{align}&\color{darkred}x_1 = -1\\&\color{darkred}x_2 = +1\\&\color{darkred}x_i = 0 && i\ge 3\end{align}\] This solution would be cut off as \[\color{darkred}y = \sum_i \color{darkred}x_i = 0\] while it obeys our original requirement of at least one \(\color{darkred}x_i\ne 0\). Clearly, no free lunch here. The only reason to use this formulation is when using discrete variables for each \(\color{darkred}x_i\) is prohibitively expensive.


Numerical experiment


Let's try out these 4 formulations with the following objective: \[\min \color{darkred}z=\sum_i \color{darkred}x_i^2\] Hopefully, just one \(\color{darkred}x_i\) will be nonzero.

Model 1 with indicator constraints does the right thing:


----     90 **** model 1: indicator constraints ****
            PARAMETER epsilon              =        0.001  larger than feasibility tolerance

----     90 VARIABLE x.L  free variables

i3 0.001


----     90 VARIABLE delta.L  binary indicator variable (indicates region)

        region2     region3

i1        1.000
i2        1.000
i3                    1.000
i4        1.000
i5        1.000
i6        1.000
i7        1.000
i8        1.000
i9        1.000
i10       1.000


----     90 VARIABLE z.L                   =  1.000000E-6  objective variable

The main issue with the implementation is that GAMS does not support indicator constraints very well. It is a bit of a half-baked functionality. Firstly, a solver option file should not change the meaning of a model. Secondly, the option file reader is too primitive to handle practical cases without using some hacks. If you take integer programming seriously, you really must take indicator constraints seriously. This will be illustrated below.

The second model with binary variables is interesting. We get incorrect results depending on the exact value of \(\color{darkblue}\epsilon\). We try the model with \(\color{darkblue}\epsilon=0.001, 0.01, 0.1\), and fixed bounds \(-1000 \le \color{darkred}x_i \le 1000\). Only for \(\color{darkblue}\epsilon=0.1\) do we get correct results! I did not expect this: I thought my values for the bounds and for \(\color{darkblue}\epsilon\) were sufficiently conservative.


----    120 **** model 2: binary variables ****
            PARAMETER epsilon              =        0.001  larger than feasibility tolerance

----    120 VARIABLE x.L  free variables

                      ( ALL       0.000 )


----    120 VARIABLE delta.L  binary indicator variable (indicates region)

        region1     region2     region3

i1  9.999990E-7                   1.000
i2        1.000 9.999990E-7
i3  9.999990E-7                   1.000
i4  9.999990E-7                   1.000
i5  9.999990E-7                   1.000
i6  9.999990E-7                   1.000
i7  9.999990E-7                   1.000
i8  9.999990E-7                   1.000
i9  9.999990E-7                   1.000
i10 9.999990E-7                   1.000


----    120 VARIABLE z.L                   =        0.000  objective variable


======================================

----    124 **** model 2: binary variables ****
            PARAMETER epsilon              =        0.010  larger than feasibility tolerance

----    124 VARIABLE x.L  free variables

                      ( ALL       0.000 )


----    124 VARIABLE delta.L  binary indicator variable (indicates region)

        region1     region3

i1  9.999900E-6       1.000
i2  9.999900E-6       1.000
i3  9.999900E-6       1.000
i4  9.999900E-6       1.000
i5  9.999900E-6       1.000
i6  9.999900E-6       1.000
i7  9.999900E-6       1.000
i8  9.999900E-6       1.000
i9  9.999900E-6       1.000
i10 9.999900E-6       1.000


----    124 VARIABLE z.L                   =        0.000  objective variable

======================================


----    128 **** model 2: binary variables ****
            PARAMETER epsilon              =        0.100  larger than feasibility tolerance

----    128 VARIABLE x.L  free variables

i10 0.100


----    128 VARIABLE delta.L  binary indicator variable (indicates region)

        region2     region3

i1        1.000
i2        1.000
i3        1.000
i4        1.000
i5        1.000
i6        1.000
i7        1.000
i8        1.000
i9        1.000
i10                   1.000


----    128 VARIABLE z.L                   =        0.010  objective variable

The bounds are quite reasonable. But nevertheless, we observe numerical issues. Very small values in the binary variable \(\color{darkred}\delta_{i,r}\) cause feasible solutions that actually don't make sense. We can play with the (feasibility) tolerances or with the bounds. Here, I just adjusted the size of \(\color{darkblue}\epsilon\). We get proper solutions only after increasing it to 0.1 (!!!). 

This is a somewhat extreme example of how Big-M constraints with binary variables can cause serious problems. However, indicator constraints can really help. They are not only a tool to help modeling but also can improve numerics. 

The SOS1 model is also working correctly:

----    157 **** model 3: SOS1 sets ****
            PARAMETER epsilon              =        0.001  larger than feasibility tolerance

----    157 VARIABLE x.L  free variables

i10 0.001


----    157 VARIABLE y.L  variable splitting

           plus

i10       0.001


----    157 VARIABLE gamma.L  gamma(i)=1 ==> |x(i)|>=epsilon

i10 1.000


----    157 VARIABLE z.L                   =  1.000000E-6  objective variable


Finally, our flawed method 4 also gives a somewhat strange solution:

----    177 **** model 4: single SOS1 set ****
            PARAMETER epsilon              =        0.001  larger than feasibility tolerance

----    177 VARIABLE x.L  free variables

i1  1.000192E-4,    i2  1.000192E-4,    i3  1.000192E-4,    i4  1.000192E-4,    i5  1.000192E-4,    i6  1.000192E-4
i7  1.000192E-4,    i8  1.000192E-4,    i9  1.000192E-4,    i10 1.000192E-4


----    177 VARIABLE y2.L  

plus 0.001


----    177 VARIABLE z.L                   =  1.000384E-7  objective variable

Here the deviation 0.001 is distributed over the \(\color{darkred}x_i\). That makes sense, as it makes the objective smaller.

Conclusion


  • Indicator constraints have two important advantages:
    • they allow for more natural, intuitive formulations
    • and they can be numerically more reliable than big-M constraints using binary variables.
  • GAMS does not do indicator variables very well.
  • This example shows how sensitive big-M constraints can be. It should scare the xxxx out of you.
  • If your solver does not support indicator variables but knows about SOS1 sets, formulations using SOS1 sets can help. However, check the performance: binary variables can be faster.

References


Appendix: GAMS model


option miqcp = cplex;

 

*------------------------------------------------

* x(i) are free variables

*------------------------------------------------

 

 

set

  i /i1*i10/

  'regions' /

   region1  'x(i) <= -epsilon'

   region2  '-epsilon <= x(i) <= epsilon'

   region3  'x(i) >= epsilon'

   /

;

 

scalar epsilon 'larger than feasibility tolerance' /0.001/;

 

variables

   z     'objective variable'

   x(i)  'free variables'

;

 

equation obj;

obj.. z =e= sum(i,sqr(x(i)));

 

*------------------------------------------------

* 1. indicator constraints

* The approach in GAMS is hardly fit for human

consumption.

*------------------------------------------------

  

binary variable

  delta(i,r'binary indicator variable (indicates region)'

;

 

equations

   indic1(i)  "delta(i,'region1')=1 ==> x(i) <= -epsilon"

   indic2a(i"delta(i,'region2')=1 ==> x(i) >= -epsilon"

   indic2b(i"delta(i,'region2')=1 ==> x(i) <= epsilon"

   indic3(i)  "delta(i,'region3')=1 ==> x(i) >= epsilon"

   sumdelta(i"one region is active"

   atleastone  "at least one nonzero region is active"

;

indic1(i)..  x(i) =l= -epsilon; 

indic2a(i).. x(i) =g= -epsilon;

indic2b(i).. x(i) =l= epsilon;

indic3(i)..  x(i) =g= epsilon;

sumdelta(i).. sum(r,delta(i,r)) =e= 1;

atleastone.. sum(i,delta(i,'region1')+delta(i,'region3')) =g= 1;

the last constraint can also be written as:

atleastone.. sum(i,delta(i,'region2')) =l= card(i)-1;

 

 

this does not work

$onecho > cplex.opt

indic indic1(i)$delta(i,'region1')  1

indic indic2a(i)$delta(i,'region2') 1

indic indic2b(i)$delta(i,'region2') 1

indic indic3(i)$delta(i,'region3')  1

$offecho

 

ugly workaround

file opt /cplex.opt/;

put opt;

loop(i,

  put 'indic ' indic1.tn(i'$' delta.tn(i,"region1"' 1'/;

  put 'indic ' indic2a.tn(i'$' delta.tn(i,"region2"' 1'/;

  put 'indic ' indic2b.tn(i'$' delta.tn(i,"region2"' 1'/;

  put 'indic ' indic3.tn(i'$' delta.tn(i,"region3"' 1'/;

);

putclose;

 

 

model m1 /all/;

m1.optfile=1;

solve m1 minimizing z using miqcp;

display "**** model 1: indicator constraints ****",epsilon,x.l,delta.l,z.l;

 

 

*------------------------------------------------

* 2. Binary variables

* Epsilon is too small.

*------------------------------------------------

 

Scalars

   LO /-1000/

   UP /1000/

;

 

x.lo(i) = LO;

x.up(i) = UP;

 

equations

  bound1(i)

  bound2a(i)

  bound2b(i)

  bound3(i)

;

 

bound1(i)..  x(i) =l= -epsilon*delta(i,'region1') + x.up(i)*(1-delta(i,'region1'));

bound2a(i).. x(i) =g= -epsilon*delta(i,'region2') + x.lo(i)*(1-delta(i,'region2'));

bound2b(i).. x(i) =l= epsilon*delta(i,'region2') + x.up(i)*(1-delta(i,'region2'));

bound3(i)..  x(i) =g= epsilon*delta(i,'region3') + x.lo(i)*(1-delta(i,'region3'));

 

model m2 /obj,bound1,bound2a,bound2b,bound3,sumdelta,atleastone/;

solve m2 minimizing z using miqcp;

display "**** model 2: binary variables ****",epsilon,x.l,delta.l,z.l;

 

epsilon = 0.01;

solve m2 minimizing z using miqcp;

display "**** model 2: binary variables ****",epsilon,x.l,delta.l,z.l;

 

epsilon = 0.1;

solve m2 minimizing z using miqcp;

display "**** model 2: binary variables ****",epsilon,x.l,delta.l,z.l;

 

reset to its original value

epsilon = 0.001;

 

*------------------------------------------------

* 3. SOS1 variables

*------------------------------------------------

 

set

  pm 'for variable splitting' /plus,min/

;

 

binary variable gamma(i'gamma(i)=1 ==> |x(i)|>=epsilon';

sos1 variable y(i,pm'variable splitting';

automatically y>=0

 

equations

   def_y(i)

   abs_y(i)

   at_least_one2

;

  

def_y(i).. y(i,'plus')-y(i,'min') =e= x(i);

abs_y(i).. y(i,'plus')+y(i,'min') =g= epsilon*gamma(i);

at_least_one2.. sum(igamma(i)) =g= 1;

 

model m3 /obj,def_y,abs_y,at_least_one2/;

solve m3 minimizing z using miqcp;

display "**** model 3: SOS1 sets ****",epsilon,x.l,y.l,gamma.l,z.l;

 

*------------------------------------------------

* 4. One SOS1 Set

* This is a hack: it cuts off solutions.

*------------------------------------------------

 

 

sos1 variable y2(pm);

automatically y2>=0

 

equations

  def_y2

  abs_y2

;

def_y2.. y2('plus')-y2('min') =e= sum(i,x(i));

abs_y2.. y2('plus')+y2('min') =g= epsilon;

 

model m4 /obj,def_y2,abs_y2/;

solve m4 minimizing z using miqcp;

display "**** model 4: single SOS1 set ****",epsilon,x.l,y2.l,z.l;

 

 

1 comment:

  1. Your second model can be strengthened by replacing the four big-M constraints with L[i] * delta[i,1] - epsilon * delta[i,2] + epsilon * delta[i,3] <= x[i] <= -epsilon * delta[i,1] + epsilon * delta[i,2] + U[i] * delta[i,3]

    ReplyDelete