## Monday, October 26, 2020

### Solution methods for linear bilevel problems

For some classes of problems, it is a strategy to form explicit KKT optimality conditions and convert the resulting complementarity conditions to big-M constraints. In case the original problem was linear (or in some cases quadratic), the resulting model is a MIP model. One such application is solving non-convex QP models . Another application is a bilevel LP, where we reformulate the inner problem by stating the optimality conditions for the inner-problem. In , such a problem is given:

bilevel LP
\begin{align}\max\> &\color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2] \\ & \min \>\color{darkred}y\\ & \qquad \color{darkred}y \ge 0 \perp \color{darkred} \lambda_1 \\ & \qquad \color{darkred}x - 0.01\color{darkred}y \le 1 \perp \color{darkred}\lambda_2 \end{align}

The notation $$\perp$$ indicates the name of the dual variable for the inner constraints. After the inner problem is converted to optimality conditions, we have:

Inner LP as optimality conditions
\begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \cdot \color{darkred}y = 0 \\ & \color{darkred}\lambda_2 \cdot \left( \color{darkred}x -0.01\color{darkred}y - 1 \right) = 0 \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \end{align}

The products can be interpreted as an "or" condition. These can be linearized using binary variables and big-M constraints:

Linearized problem
\begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \le \color{darkblue} M_1^d \color{darkred}\delta_1 \\ & \color{darkred}y \le \color{darkblue} M_1^p (1-\color{darkred}\delta_1) \\ & \color{darkred}\lambda_2 \le \color{darkblue} M_2^d \color{darkred}\delta_2 \\ & -\color{darkred}x +0.01\color{darkred}y + 1 \le \color{darkblue} M_2^p (1-\color{darkred}\delta_2) \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \\ & \color{darkred}\delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align}

The letters $$p$$ and $$d$$ in the big-M constants indicate primal and dual.  Note that we flipped the sign of the second constraint of the inner problem. This was to make $$\color{darkred}\lambda_2$$ a non-negative variable.

#### Choosing big-M values

One big issue is: choose appropriate values for the big-M constants. The values $$\color{darkblue} M_1^d,\color{darkblue} M_1^p,\color{darkblue} M_2^d,\color{darkblue} M_2^p = 200$$ are valid bounds on $$\color{darkred}\lambda_1, \color{darkred}y, \color{darkred}\lambda_2, -\color{darkred}x +0.01\color{darkred}y + 1$$. This gives the solution:

                           LOWER          LEVEL          UPPER

---- VAR x                   .             2.0000         2.0000
---- VAR y                   .           100.0000        +INF
---- VAR lambda1             .              .            +INF
---- VAR lambda2             .           100.0000        +INF
---- VAR delta1              .              .             1.0000
---- VAR delta2              .             1.0000         1.0000
---- VAR z                 -INF          102.0000        +INF


The problem of choosing the right big M values is important. Choosing values that are too small, can lead to cutting off optimal values. Big-M values that are too large have other problems such as numerical issues and trickle flow. In  it is argued that many modelers use the following algorithm to detect if a big-M value is too small:

1. Select initial values for $$\color{darkblue} M_j^d, \color{darkblue} M_j^p$$.
2. Solve the linearized MIP model.
3. Look for binding big-M constraints that should be inactivated. If not found, stop: we have found the optimal solution.
4. Increase the corresponding big-M value.
5. Go to step 2.

Unfortunately, this is algorithm is not guaranteed to work.  gives an example: use as starting values $$\color{darkblue} M_j^d = 50, \color{darkblue} M_j^p = 200$$. The solution looks like:

                           LOWER          LEVEL          UPPER

---- VAR x                   .             2.0000         2.0000
---- VAR y                   .              .            +INF
---- VAR lambda1             .             1.0000        +INF
---- VAR lambda2             .              .            +INF
---- VAR delta1              .             1.0000         1.0000
---- VAR delta2              .             1.0000         1.0000
---- VAR z                 -INF            2.0000        +INF


The values $$\color{darkred}\delta_j=1$$ indicate we should look at the constraints: $$\color{darkred}\lambda_j \le \color{darkblue}M_j^d$$. As $$\color{darkred}\lambda_1=1, \color{darkred}\lambda_2=0$$ these are not binding. Hence, the erroneous conclusion is that this solution is optimal.

Our conclusion is: we cannot reliably detect that big-M values are chosen too small by searching for binding constraints.

#### Alternatives

In , it is argued that SOS1 (special ordered sets of type 1) variables should be used instead of binary variables. For our example, this means we would replace our big-M constraints by:

SOS1 representation
\begin{align} & \color{darkred}\lambda_1, \color{darkred}y \in SOS1 \\ & \color{darkred}\lambda_2, \color{darkred}s \in SOS1 \\ &\color{darkred}s = -\color{darkred}x+0.01\color{darkred}y+1 \\ & \color{darkred}s \ge 0 \end{align}

Many MIP solvers (but not all) support SOS1 variables. Binary variables can sometimes be faster, but SOS1 variables have the advantage that no big-M constants are needed. The same can be said for indicator constraints. A formulation using indicator constraints can look like:

Indicator constraints
\begin{align} & \color{darkred} \delta_1 = 0 \Rightarrow \color{darkred}\lambda_1 =0 \\ & \color{darkred} \delta_1 = 1\Rightarrow \color{darkred}y = 0 \\ & \color{darkred} \delta_2 = 0 \Rightarrow \color{darkred}\lambda_2 = 0 \\ & \color{darkred} \delta_2 = 1 \Rightarrow \color{darkred}x-0.01\color{darkred}y = 1 \\ & \color{darkred}x-0.01\color{darkred}y \le 1 \\ & \color{darkred} \delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align}

We can also solve the quadratic model directly using global solvers like Antigone, Baron, Couenne, or Gurobi. Sometimes reasonable bounds are needed to help globals solvers.

It is noted that the GAMS EMP tool can generate the quadratic model for you. I still had to specify to use a global solver for it to find the optimal solution.

#### Conclusion

Instead of using big-M constraints to handle complementarity conditions in linear bi-level optimization problems, there are a few alternatives:
• SOS1 variables
• Indicator constraints
• Global solvers for non-convex quadratic problems

#### References

1. Solving non-convex QP problems as a MIP,  https://yetanothermathprogrammingconsultant.blogspot.com/2016/06/solving-non-convex-qp-problems-as-mip.html
2. Salvador Pineda and Juan Miguel Morales, Solving Linear Bilevel Problems Using Big-Ms: Not All That Glitters Is Gold, September 2018,  https://arxiv.org/pdf/1809.10448.pdf
3. Thomas Kleinert and Martin Schmidt, Why there is no need to use a big-M in linear bilevel optimization: A computational study of two ready-to-use approaches, October 2020,  http://www.optimization-online.org/DB_FILE/2020/10/8065.pdf

#### Appendix: GAMS Code

 $ontext different formulations of a bilevel optimization problem. model 1. high level model solved as EMP model max x + y st x in [0,2] min y st y >= 0 : lambda1 >= 0 x-0.01y <= 1 : lambda2 >= 0 Note: in subsequent model we flip the last constraint to make the sign of the dual >= 0 This model can be solved using EMP + a global solver.$offtext option optcr=0; positive variables x,y; x.up = 2; variables    z 'outer objective'    zin 'inner objective' ; equations    obj     'outer objective'    objin   'inner objective'    cin1     'inner constraint 1'    cin2     'inner constraint 2' ; obj..     z =e= x+y; objin..   zin =e= y; cin1..    y =g= 0; cin2..    x-0.01*y =l= 1; model m1 /obj, objin, cin1, cin2/; $onecho > "%emp.info%" bilevel x min zin y objin cin1 cin2$offecho $onecho > jams.opt subsolver couenne$offecho m1.optfile=1; solve m1 using EMP maximizing z; $ontext model 2. Optimality conditions for inner problem. This is a non-convex quadratic model. This can be solved with Gurobi, Couenne, Baron, Antigone. max x + y st x in [0,2] 1 - lambda1 - 0.01*lambda2 = 0 lambda1*y = 0 lambda2*(-x+0.01y+1) = 0 lambda1,lambda2>=0$offtext positive variables lambda1,lambda2; equations    dLdy    'derivative of lagrangean'    quad1   'complementarity constraint (quadratic)'    quad2   'complementarity constraint (quadratic)' ; dLdy..  1 - lambda1 - 0.01*lambda2 =e= 0; quad1.. lambda1*y =e= 0; quad2.. lambda2*(x-0.01*y-1) =e= 0; model m2 /obj,dLdy,cin2,quad1,quad2/; option minlp = couenne; solve m2 using minlp maximizing z; $ontext model 3. Linearized version of model 2 using big-M constraints max x + y st x in [0,2] 1 - lambda1 - 0.01*lambda2 = 0 lambda1 <= delta1*M1d y <= (1-delta1)*M1p lambda2 <= delta2*M2d -x+0.01y+1 <= (1-delta2)*M2p lambda1,lambda2>=0 delta1, delta2 binary$offtext binary variable delta1, delta2; equations    comp1d  'linearized complementarity constraint (dual)'    comp1p  'linearized complementarity constraint (primal)'    comp2d  'linearized complementarity constraint (dual)'    comp2p  'linearized complementarity constraint (primal)' ; scalars    M1p  'big-M for primal' /200/    M1d  'big-M for dual'   /200/    M2p  'big-M for primal' /200/    M2d  'big-M for dual'   /200/ ; comp1d.. lambda1 =l= delta1*M1d; comp1p.. y =l= (1-delta1)*M1p; comp2d.. lambda2 =l= delta2*M2d; comp2p.. -x+0.01*y+1 =l= (1-delta2)*M2p; model m3 /obj,dLdy,comp1d,comp1p,comp2d,comp2p/; solve m3 using mip maximizing z; $ontext model 4. Linearized version of model 2 using SOS1 variables max x + y st x in [0,2] 1 - lambda1 - 0.01*lambda2 = 0 s = -x+0.01y+1 lambda1,lambda2,s >=0 lambda1, y forms a SOS1 set lambda2, s forms a SOS1 set In GAMS we need to reformulate this a bit as SOS1 variables are indexed. No need for s in the GAMS model.$offtext set k 'SOS members' /k1,k2/; sos1 variable    sos_1(k)  'for the pair lambda1, y'    sos_2(k)  'for the pair lambda2, -x+0.01y+1' ; equations   sos_1a  'assign to sos1 member'   sos_1b  'assign to sos1 member'   sos_2a  'assign to sos1 member'   sos_2b  'assign to sos1 member' ; sos_1a.. y =e= sos_1('k1'); sos_1b.. lambda1 =e= sos_1('k2'); sos_2a.. -x+0.01*y+1 =e= sos_2('k1'); sos_2b.. lambda2 =e= sos_2('k2'); model m4 /obj,dLdy,sos_1a,sos_1b, sos_2a, sos_2b/; solve m4 using mip maximizing z; $ontext model 5. Linearized version of model 2 using indicator constraints this is very ugly in GAMS max x + y st x in [0,2] 1 - lambda1 - 0.01*lambda2 = 0 x - 0.01y <= 1 d1 = 0 ==> lambda1 = 0 d1 = 1 ==> y = 0 d2 = 0 ==> lambda2 = 0 d2 = 1 ==> x - 0.01y = 1 lambda1, lambda2 >= 0 d1, d2 in {0,1} y >= 0$offtext binary variables    d1   'lambda1 = 0 OR y = 0'    d2   'lambda2 = 0 OR x - 0.01y = 1' ; equations    d10 'part of indicator constraint'    d11 'part of indicator constraint'    d20 'part of indicator constraint'    d21 'part of indicator constraint'    dummy 'make sure d1,d2 are part of the model' ; d10.. lambda1 =e= 0; d11.. y =e= 0; d20.. lambda2 =e= 0; d21.. x-0.01*y =e= 1; dummy.. d1+d2 =g= 0; $onecho > cplex.opt indic d10$d1 0 indic d11$d1 1 indic d20$d2 0 indic d21$d2 1$offecho model m5 /obj,dLdy,d10,d11,d20,d21,cin2,dummy/; m5.optfile = 1; solve m5 using mip maximizing z;