I have free variables \(\color{darkred}x_i\). How can I impose the constraint that at least one of the variables is nonzero: \(\color{darkred}x_i\ne 0\).
Observations and formulations
- \(\color{darkred}x_i\ne 0\) is somewhat difficult. We need to write this as \[\begin{align}&\color{darkred}x_i\le -\color{darkblue}\epsilon \\ & \textbf{or} \\ & \color{darkred}x_i\ge \color{darkblue}\epsilon\end{align}\] Here \(\color{darkblue}\epsilon\gt 0\) is a small constant. This "or"-type of constraint needs a binary variable (or something similar). It can't be modeled in a pure LP setting. (We have some non-convexity here).
- The constant \(\color{darkblue}\epsilon\) should not be too small, as the solver will use some feasibility tolerances. These tolerances are, in general, absolute (i.e. not relative). As the solver typically applies some scaling, this tolerance may be even larger than you think (in terms of the original constraints). And who knows what the impact of preprocessing is. I sometimes see modelers using using something like \(\color{darkblue}\epsilon = \textrm{1e-6}\). This looks too small for me. I typically use something like 0.001 or 0.0001. Of course, this depends on the order of magnitudes of the values that \(\color{darkred}x_i\) can assume.
- We can split each \(\color{darkred}x_i\) into three regions: \(\lt 0, = 0, \gt 0\). An intuitive way to write this down is using indicator constraints: \[\begin{align}&\color{darkred}\delta_{i,1}=1 \implies \color{darkred}x_i \le -\color{darkblue}\epsilon\\ & \color{darkred}\delta_{i,2}=1 \implies -\color{darkblue}\epsilon \le \color{darkred}x_i \le \color{darkblue}\epsilon \\ &\color{darkred}\delta_{i,3}=1 \implies \color{darkred}x_i \ge \color{darkblue}\epsilon \\ & \sum_r \color{darkred}\delta_{i,r} = 1 &&\forall i \\ & \sum_i (\color{darkred}\delta_{i,1}+\color{darkred}\delta_{i,3})\ge 1\\ & \color{darkred}\delta_{i,r} \in \{0,1\} \end{align} \] The middle implication usually needs to be implemented as two constraints. Note that at the points \(\color{darkred}x_i = \pm \color{darkblue}\epsilon\) we have a bit of ambiguity. In practice, that is not a bad thing: let the solver determine the most profitable region for those cases. The last constraint can also be stated as \[\sum_i \color{darkred}\delta_{i,2}\le \color{darkblue}n-1\] where \(\color{darkblue}n\) is the size of set \(i \in I\), i.e., \(\color{darkblue}n=|I|\).
- Unfortunately, not all modeling tools and solvers support indicator constraints. They really should, as this is a useful modeling concept. If we have good, tight bounds \(\color{darkred}x_i \in [\color{darkblue}L_i,\color{darkblue}U_i]\), with \(\color{darkblue}L_i \lt 0\), then we can use binary variables. Here is a possible formulation: \[\begin{align}&\color{darkred}x_i \le -\color{darkblue}\epsilon \cdot \color{darkred}\delta_{i,1} + \color{darkblue}U_i \cdot (1-\color{darkred}\delta_{i,1})\\ & \color{darkred}x_i \ge -\color{darkblue}\epsilon \cdot \color{darkred}\delta_{i,2} + \color{darkblue}L_i \cdot (1-\color{darkred}\delta_{i,2})\\ & \color{darkred}x_i \le \color{darkblue}\epsilon \cdot \color{darkred}\delta_{i,2} + \color{darkblue}U_i \cdot (1-\color{darkred}\delta_{i,2}) \\ & \color{darkred}x_i \ge \color{darkblue}\epsilon \cdot \color{darkred}\delta_{i,3} + \color{darkblue}L_i\cdot (1-\color{darkred}\delta_{i,3}) \\ & \sum_r \color{darkred}\delta_{i,r} = 1 &&\forall i\\ & \sum_i (\color{darkred}\delta_{i,1}+\color{darkred}\delta_{i,3})\ge 1 \\ & \color{darkred}\delta_{i,r} \in \{0,1\} \end{align} \]
- If we don't have good bounds on \(\color{darkred}x_i\) and no indicator constraints, we can use so-called Special Ordered Sets of Type 1 (a.k.a. SOS1 sets). A high-level model can look like: \[\begin{align}& |\color{darkred}x_i| \ge \color{darkblue}\epsilon \cdot \color{darkred}\gamma_i \\ & \sum_i \color{darkred}\gamma_i \ge 1 & \\ & \color{darkred}\gamma_i \in \{0,1\} \end{align}\] The non-convex absolute value constraint can be modeled as: \[\begin{align}&\color{darkred}y_i^+ - \color{darkred}y_i^- = \color{darkred}x_i \\ &\color{darkred}y_i^+ + \color{darkred}y_i^- \ge \color{darkblue}\epsilon \cdot \color{darkred}\gamma_i \\ & \color{darkred}y_i^+, \color{darkred}y_i^- \ge 0 \\ & \color{darkred}y_i^+, \color{darkred}y_i^- \in \textbf{SOS1}\end{align}\] I.e., we have \(\color{darkblue}n=|I|\) SOS1 sets with two members each. The SOS1 sets ensures that only one of \(\color{darkred}y_i^+, \color{darkred}y_i^-\) can be nonzero. This construct does the same thing as a complementarity condition \[\color{darkred}y_i^+ \cdot \color{darkred}y_i^- = 0\] We can also simulate indicator constraints directly using SOS1 sets.
- In [1], I suggested somewhat of a hack. I should not have mentioned that. Let me explain. Instead of checking each \(\color{darkred}x_i\), we can try to do \[\color{darkred}y = \sum_i \color{darkred}x_i\] Now we just require: \(\color{darkred}y\ne 0\). That will force some \(\color{darkred}x_i \ne 0\). Note that the reverse is not true: if \(\color{darkred}y=0\), we not necessarily have that all \(\color{darkred}x_i = 0\). E.g. consider: \[\begin{align}&\color{darkred}x_1 = -1\\&\color{darkred}x_2 = +1\\&\color{darkred}x_i = 0 && i\ge 3\end{align}\] This solution would be cut off as \[\color{darkred}y = \sum_i \color{darkred}x_i = 0\] while it obeys our original requirement of at least one \(\color{darkred}x_i\ne 0\). Clearly, no free lunch here. The only reason to use this formulation is when using discrete variables for each \(\color{darkred}x_i\) is prohibitively expensive.
Numerical experiment
---- 90 **** model 1: indicator constraints **** PARAMETER epsilon = 0.001 larger than feasibility tolerance ---- 90 VARIABLE x.L free variables i3 0.001 ---- 90 VARIABLE delta.L binary indicator variable (indicates region) region2 region3 i1 1.000 i2 1.000 i3 1.000 i4 1.000 i5 1.000 i6 1.000 i7 1.000 i8 1.000 i9 1.000 i10 1.000 ---- 90 VARIABLE z.L = 1.000000E-6 objective variable
---- 120 **** model 2: binary variables **** PARAMETER epsilon = 0.001 larger than feasibility tolerance ---- 120 VARIABLE x.L free variables ( ALL 0.000 ) ---- 120 VARIABLE delta.L binary indicator variable (indicates region) region1 region2 region3 i1 9.999990E-7 1.000 i2 1.000 9.999990E-7 i3 9.999990E-7 1.000 i4 9.999990E-7 1.000 i5 9.999990E-7 1.000 i6 9.999990E-7 1.000 i7 9.999990E-7 1.000 i8 9.999990E-7 1.000 i9 9.999990E-7 1.000 i10 9.999990E-7 1.000 ---- 120 VARIABLE z.L = 0.000 objective variable ====================================== ---- 124 **** model 2: binary variables **** PARAMETER epsilon = 0.010 larger than feasibility tolerance ---- 124 VARIABLE x.L free variables ( ALL 0.000 ) ---- 124 VARIABLE delta.L binary indicator variable (indicates region) region1 region3 i1 9.999900E-6 1.000 i2 9.999900E-6 1.000 i3 9.999900E-6 1.000 i4 9.999900E-6 1.000 i5 9.999900E-6 1.000 i6 9.999900E-6 1.000 i7 9.999900E-6 1.000 i8 9.999900E-6 1.000 i9 9.999900E-6 1.000 i10 9.999900E-6 1.000 ---- 124 VARIABLE z.L = 0.000 objective variable ====================================== ---- 128 **** model 2: binary variables **** PARAMETER epsilon = 0.100 larger than feasibility tolerance ---- 128 VARIABLE x.L free variables i10 0.100 ---- 128 VARIABLE delta.L binary indicator variable (indicates region) region2 region3 i1 1.000 i2 1.000 i3 1.000 i4 1.000 i5 1.000 i6 1.000 i7 1.000 i8 1.000 i9 1.000 i10 1.000 ---- 128 VARIABLE z.L = 0.010 objective variable
---- 157 **** model 3: SOS1 sets **** PARAMETER epsilon = 0.001 larger than feasibility tolerance ---- 157 VARIABLE x.L free variables i10 0.001 ---- 157 VARIABLE y.L variable splitting plus i10 0.001 ---- 157 VARIABLE gamma.L gamma(i)=1 ==> |x(i)|>=epsilon i10 1.000 ---- 157 VARIABLE z.L = 1.000000E-6 objective variable
---- 177 **** model 4: single SOS1 set **** PARAMETER epsilon = 0.001 larger than feasibility tolerance ---- 177 VARIABLE x.L free variables i1 1.000192E-4, i2 1.000192E-4, i3 1.000192E-4, i4 1.000192E-4, i5 1.000192E-4, i6 1.000192E-4 i7 1.000192E-4, i8 1.000192E-4, i9 1.000192E-4, i10 1.000192E-4 ---- 177 VARIABLE y2.L plus 0.001 ---- 177 VARIABLE z.L = 1.000384E-7 objective variable
Conclusion
- Indicator constraints have two important advantages:
- they allow for more natural, intuitive formulations
- and they can be numerically more reliable than big-M constraints using binary variables.
- GAMS does not do indicator variables very well.
- This example shows how sensitive big-M constraints can be. It should scare the xxxx out of you.
- If your solver does not support indicator variables but knows about SOS1 sets, formulations using SOS1 sets can help. However, check the performance: binary variables can be faster.
References
- Forcing at least one free variable to be non-zero in LP, https://stackoverflow.com/questions/77800059/forcing-at-least-one-free-variable-to-be-non-zero-in-lp
Appendix: GAMS model
option miqcp = cplex;
*------------------------------------------------ * x(i) are free variables *------------------------------------------------
set i /i1*i10/ r 'regions' / region1 'x(i) <= -epsilon' region2 '-epsilon <= x(i) <= epsilon' region3 'x(i) >= epsilon' / ;
scalar epsilon 'larger than feasibility tolerance' /0.001/;
variables z 'objective variable' x(i) 'free variables' ;
equation obj; obj.. z =e= sum(i,sqr(x(i)));
*------------------------------------------------ * 1. indicator constraints * The approach in GAMS is hardly fit for human * consumption. *------------------------------------------------ binary variable delta(i,r) 'binary indicator variable (indicates region)' ;
equations indic1(i) "delta(i,'region1')=1 ==> x(i) <= -epsilon" indic2a(i) "delta(i,'region2')=1 ==> x(i) >= -epsilon" indic2b(i) "delta(i,'region2')=1 ==> x(i) <= epsilon" indic3(i) "delta(i,'region3')=1 ==> x(i) >= epsilon" sumdelta(i) "one region is active" atleastone "at least one nonzero region is active" ; indic1(i).. x(i) =l= -epsilon; indic2a(i).. x(i) =g= -epsilon; indic2b(i).. x(i) =l= epsilon; indic3(i).. x(i) =g= epsilon; sumdelta(i).. sum(r,delta(i,r)) =e= 1; atleastone.. sum(i,delta(i,'region1')+delta(i,'region3')) =g= 1; * the last constraint can also be written as: * atleastone.. sum(i,delta(i,'region2')) =l= card(i)-1;
* this does not work $onecho > cplex.opt indic indic1(i)$delta(i,'region1') 1 indic indic2a(i)$delta(i,'region2') 1 indic indic2b(i)$delta(i,'region2') 1 indic indic3(i)$delta(i,'region3') 1 $offecho
* ugly workaround file opt /cplex.opt/; put opt; loop(i, put 'indic ' indic1.tn(i) '$' delta.tn(i,"region1") ' 1'/; put 'indic ' indic2a.tn(i) '$' delta.tn(i,"region2") ' 1'/; put 'indic ' indic2b.tn(i) '$' delta.tn(i,"region2") ' 1'/; put 'indic ' indic3.tn(i) '$' delta.tn(i,"region3") ' 1'/; ); putclose;
model m1 /all/; m1.optfile=1; solve m1 minimizing z using miqcp; display "**** model 1: indicator constraints ****",epsilon,x.l,delta.l,z.l;
*------------------------------------------------ * 2. Binary variables * Epsilon is too small. *------------------------------------------------
Scalars LO /-1000/ UP /1000/ ;
x.lo(i) = LO; x.up(i) = UP;
equations bound1(i) bound2a(i) bound2b(i) bound3(i) ;
bound1(i).. x(i) =l= -epsilon*delta(i,'region1') + x.up(i)*(1-delta(i,'region1')); bound2a(i).. x(i) =g= -epsilon*delta(i,'region2') + x.lo(i)*(1-delta(i,'region2')); bound2b(i).. x(i) =l= epsilon*delta(i,'region2') + x.up(i)*(1-delta(i,'region2')); bound3(i).. x(i) =g= epsilon*delta(i,'region3') + x.lo(i)*(1-delta(i,'region3'));
model m2 /obj,bound1,bound2a,bound2b,bound3,sumdelta,atleastone/; solve m2 minimizing z using miqcp; display "**** model 2: binary variables ****",epsilon,x.l,delta.l,z.l;
epsilon = 0.01; solve m2 minimizing z using miqcp; display "**** model 2: binary variables ****",epsilon,x.l,delta.l,z.l;
epsilon = 0.1; solve m2 minimizing z using miqcp; display "**** model 2: binary variables ****",epsilon,x.l,delta.l,z.l;
* reset to its original value epsilon = 0.001;
*------------------------------------------------ * 3. SOS1 variables *------------------------------------------------
set pm 'for variable splitting' /plus,min/ ;
binary variable gamma(i) 'gamma(i)=1 ==> |x(i)|>=epsilon'; sos1 variable y(i,pm) 'variable splitting'; * automatically y>=0
equations def_y(i) abs_y(i) at_least_one2 ; def_y(i).. y(i,'plus')-y(i,'min') =e= x(i); abs_y(i).. y(i,'plus')+y(i,'min') =g= epsilon*gamma(i); at_least_one2.. sum(i, gamma(i)) =g= 1;
model m3 /obj,def_y,abs_y,at_least_one2/; solve m3 minimizing z using miqcp; display "**** model 3: SOS1 sets ****",epsilon,x.l,y.l,gamma.l,z.l;
*------------------------------------------------ * 4. One SOS1 Set * This is a hack: it cuts off solutions. *------------------------------------------------
sos1 variable y2(pm); * automatically y2>=0
equations def_y2 abs_y2 ; def_y2.. y2('plus')-y2('min') =e= sum(i,x(i)); abs_y2.. y2('plus')+y2('min') =g= epsilon;
model m4 /obj,def_y2,abs_y2/; solve m4 minimizing z using miqcp; display "**** model 4: single SOS1 set ****",epsilon,x.l,y2.l,z.l;
|
Your second model can be strengthened by replacing the four big-M constraints with L[i] * delta[i,1] - epsilon * delta[i,2] + epsilon * delta[i,3] <= x[i] <= -epsilon * delta[i,1] + epsilon * delta[i,2] + U[i] * delta[i,3]
ReplyDelete