For some classes of problems, it is a strategy to form explicit KKT optimality conditions and convert the resulting complementarity conditions to big-M constraints. In case the original problem was linear (or in some cases quadratic), the resulting model is a MIP model. One such application is solving non-convex QP models [1]. Another application is a bilevel LP, where we reformulate the inner problem by stating the optimality conditions for the inner-problem. In [2], such a problem is given:
bilevel LP |
\[\begin{align}\max\> &\color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2] \\ & \min \>\color{darkred}y\\ & \qquad \color{darkred}y \ge 0 \perp \color{darkred} \lambda_1 \\ & \qquad \color{darkred}x - 0.01\color{darkred}y \le 1 \perp \color{darkred}\lambda_2 \end{align} \] |
The notation \(\perp\) indicates the name of the dual variable for the inner constraints. After the inner problem is converted to optimality conditions, we have:
Inner LP as optimality conditions |
\[\begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \cdot \color{darkred}y = 0 \\ & \color{darkred}\lambda_2 \cdot \left( \color{darkred}x -0.01\color{darkred}y - 1 \right) = 0 \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \end{align} \] |
The products can be interpreted as an "
or" condition. These can be linearized using binary variables and big-M constraints:
Linearized problem |
\[\begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \le \color{darkblue} M_1^d \color{darkred}\delta_1 \\ & \color{darkred}y \le \color{darkblue} M_1^p (1-\color{darkred}\delta_1) \\ & \color{darkred}\lambda_2 \le \color{darkblue} M_2^d \color{darkred}\delta_2 \\ & -\color{darkred}x +0.01\color{darkred}y + 1 \le \color{darkblue} M_2^p (1-\color{darkred}\delta_2) \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \\ & \color{darkred}\delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align} \] |
The letters \(p\) and \(d\) in the big-M constants indicate primal and dual. Note that we flipped the sign of the second constraint of the inner problem. This was to make \(\color{darkred}\lambda_2\) a non-negative variable.
Choosing big-M values
One big issue is: choose appropriate values for the big-M constants. The values \(\color{darkblue} M_1^d,\color{darkblue} M_1^p,\color{darkblue} M_2^d,\color{darkblue} M_2^p = 200\) are valid bounds on \(\color{darkred}\lambda_1, \color{darkred}y, \color{darkred}\lambda_2, -\color{darkred}x +0.01\color{darkred}y + 1\). This gives the solution:
LOWER LEVEL UPPER
---- VAR x . 2.0000 2.0000
---- VAR y . 100.0000 +INF
---- VAR lambda1 . . +INF
---- VAR lambda2 . 100.0000 +INF
---- VAR delta1 . . 1.0000
---- VAR delta2 . 1.0000 1.0000
---- VAR z -INF 102.0000 +INF
The problem of choosing the right big M values is important. Choosing values that are too small, can lead to cutting off optimal values. Big-M values that are too large have other problems such as numerical issues and trickle flow. In [2] it is argued that many modelers use the following algorithm to detect if a big-M value is too small:
- Select initial values for \( \color{darkblue} M_j^d, \color{darkblue} M_j^p \).
- Solve the linearized MIP model.
- Look for binding big-M constraints that should be inactivated. If not found, stop: we have found the optimal solution.
- Increase the corresponding big-M value.
- Go to step 2.
Unfortunately, this is algorithm is not guaranteed to work. [2] gives an example: use as starting values \(\color{darkblue} M_j^d = 50, \color{darkblue} M_j^p = 200\). The solution looks like:
LOWER LEVEL UPPER
---- VAR x . 2.0000 2.0000
---- VAR y . . +INF
---- VAR lambda1 . 1.0000 +INF
---- VAR lambda2 . . +INF
---- VAR delta1 . 1.0000 1.0000
---- VAR delta2 . 1.0000 1.0000
---- VAR z -INF 2.0000 +INF
The values \(\color{darkred}\delta_j=1\) indicate we should look at the constraints: \(\color{darkred}\lambda_j \le \color{darkblue}M_j^d\). As \(\color{darkred}\lambda_1=1, \color{darkred}\lambda_2=0\) these are not binding. Hence, the
erroneous conclusion is that this solution is optimal.
Our conclusion is: we cannot reliably detect that big-M values are chosen too small by searching for binding constraints.
Alternatives
In [3], it is argued that SOS1 (special ordered sets of type 1) variables should be used instead of binary variables. For our example, this means we would replace our big-M constraints by:
SOS1 representation |
\[\begin{align} & \color{darkred}\lambda_1, \color{darkred}y \in SOS1 \\ & \color{darkred}\lambda_2, \color{darkred}s \in SOS1 \\ &\color{darkred}s = -\color{darkred}x+0.01\color{darkred}y+1 \\ & \color{darkred}s \ge 0 \end{align} \] |
Many MIP solvers (but not all) support SOS1 variables. Binary variables can sometimes be faster, but SOS1 variables have the advantage that no big-M constants are needed. The same can be said for indicator constraints. A formulation using indicator constraints can look like:
Indicator constraints |
\[\begin{align} & \color{darkred} \delta_1 = 0 \Rightarrow \color{darkred}\lambda_1 =0 \\ & \color{darkred} \delta_1 = 1\Rightarrow \color{darkred}y = 0 \\ & \color{darkred} \delta_2 = 0 \Rightarrow \color{darkred}\lambda_2 = 0 \\ & \color{darkred} \delta_2 = 1 \Rightarrow \color{darkred}x-0.01\color{darkred}y = 1 \\ & \color{darkred}x-0.01\color{darkred}y \le 1 \\ & \color{darkred} \delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align} \] |
We can also solve the quadratic model directly using
global solvers like Antigone, Baron, Couenne, or Gurobi. Sometimes reasonable bounds are needed to help globals solvers.
It is noted that the GAMS
EMP tool can generate the quadratic model for you. I still had to specify to use a global solver for it to find the optimal solution.
Conclusion
Instead of using big-M constraints to handle complementarity conditions in linear bi-level optimization problems, there are a few alternatives:
- SOS1 variables
- Indicator constraints
- Global solvers for non-convex quadratic problems
References
Appendix: GAMS Code
$ontext
different
formulations of a bilevel optimization problem.
model 1.
high level model solved as EMP model
max x +
y
st
x in [0,2]
min y
st y >= 0 : lambda1 >= 0
x-0.01y <= 1 : lambda2
>= 0
Note: in
subsequent model we flip the last constraint to make the sign of the dual
>= 0
This
model can be solved using EMP + a global solver.
$offtext
option optcr=0;
positive variables x,y;
x.up = 2;
variables
z 'outer objective'
zin 'inner objective'
;
equations
obj 'outer objective'
objin 'inner objective'
cin1 'inner constraint 1'
cin2 'inner constraint 2'
;
obj.. z =e= x+y;
objin.. zin =e= y;
cin1.. y =g= 0;
cin2.. x-0.01*y =l= 1;
model m1 /obj, objin, cin1, cin2/;
$onecho > "%emp.info%"
bilevel x
min zin y objin cin1 cin2
$offecho
$onecho > jams.opt
subsolver couenne
$offecho
m1.optfile=1;
solve m1 using EMP maximizing
z;
$ontext
model 2.
Optimality conditions for inner problem.
This is
a non-convex quadratic model.
This can be solved with Gurobi, Couenne, Baron, Antigone.
max x +
y
st
x in [0,2]
1 - lambda1 - 0.01*lambda2 = 0
lambda1*y = 0
lambda2*(-x+0.01y+1) = 0
lambda1,lambda2>=0
$offtext
positive variables lambda1,lambda2;
equations
dLdy 'derivative of lagrangean'
quad1 'complementarity constraint (quadratic)'
quad2 'complementarity constraint (quadratic)'
;
dLdy.. 1 - lambda1 - 0.01*lambda2 =e=
0;
quad1.. lambda1*y =e= 0;
quad2.. lambda2*(x-0.01*y-1) =e= 0;
model m2 /obj,dLdy,cin2,quad1,quad2/;
option minlp =
couenne;
solve m2 using minlp
maximizing z;
$ontext
model 3.
Linearized version of model 2 using big-M constraints
max x +
y
st
x in [0,2]
1 - lambda1 - 0.01*lambda2 = 0
lambda1 <= delta1*M1d
y <= (1-delta1)*M1p
lambda2 <= delta2*M2d
-x+0.01y+1 <= (1-delta2)*M2p
lambda1,lambda2>=0
delta1, delta2 binary
$offtext
binary variable delta1, delta2;
equations
comp1d 'linearized complementarity constraint (dual)'
comp1p 'linearized complementarity constraint (primal)'
comp2d 'linearized complementarity constraint (dual)'
comp2p 'linearized complementarity constraint (primal)'
;
scalars
M1p
'big-M
for primal' /200/
M1d
'big-M
for dual' /200/
M2p
'big-M
for primal' /200/
M2d
'big-M
for dual' /200/
;
comp1d.. lambda1 =l= delta1*M1d;
comp1p.. y =l= (1-delta1)*M1p;
comp2d.. lambda2 =l= delta2*M2d;
comp2p.. -x+0.01*y+1 =l= (1-delta2)*M2p;
model m3 /obj,dLdy,comp1d,comp1p,comp2d,comp2p/;
solve m3 using mip maximizing
z;
$ontext
model 4.
Linearized version of model 2 using SOS1 variables
max x +
y
st
x in [0,2]
1 - lambda1 - 0.01*lambda2 = 0
s = -x+0.01y+1
lambda1,lambda2,s >=0
lambda1, y forms a SOS1 set
lambda2, s forms a SOS1 set
In GAMS
we need to reformulate this a bit as SOS1 variables are indexed.
No need
for s in the GAMS model.
$offtext
set k 'SOS members' /k1,k2/;
sos1 variable
sos_1(k) 'for the pair lambda1, y'
sos_2(k) 'for the pair lambda2, -x+0.01y+1'
;
equations
sos_1a 'assign to sos1 member'
sos_1b 'assign to sos1 member'
sos_2a 'assign to sos1 member'
sos_2b 'assign to sos1 member'
;
sos_1a.. y =e= sos_1('k1');
sos_1b.. lambda1 =e= sos_1('k2');
sos_2a.. -x+0.01*y+1 =e= sos_2('k1');
sos_2b.. lambda2 =e= sos_2('k2');
model m4 /obj,dLdy,sos_1a,sos_1b, sos_2a, sos_2b/;
solve m4 using mip maximizing
z;
$ontext
model 5.
Linearized version of model 2 using indicator constraints
this is
very ugly in GAMS
max x +
y
st
x in [0,2]
1 - lambda1 - 0.01*lambda2 = 0
x - 0.01y <= 1
d1 = 0 ==> lambda1 = 0
d1 = 1 ==> y = 0
d2 = 0 ==> lambda2 = 0
d2 = 1 ==> x - 0.01y = 1
lambda1, lambda2 >= 0
d1, d2 in {0,1}
y >= 0
$offtext
binary variables
d1
'lambda1
= 0 OR y = 0'
d2
'lambda2
= 0 OR x - 0.01y = 1'
;
equations
d10 'part of indicator
constraint'
d11 'part of indicator
constraint'
d20 'part of indicator
constraint'
d21 'part of indicator
constraint'
dummy 'make sure d1,d2 are part of
the model'
;
d10.. lambda1 =e= 0;
d11.. y =e= 0;
d20.. lambda2 =e= 0;
d21.. x-0.01*y =e= 1;
dummy.. d1+d2 =g= 0;
$onecho > cplex.opt
indic d10$d1 0
indic d11$d1 1
indic d20$d2 0
indic d21$d2 1
$offecho
model m5 /obj,dLdy,d10,d11,d20,d21,cin2,dummy/;
m5.optfile = 1;
solve m5 using mip maximizing
z;
|
No comments:
Post a Comment