Monday, October 26, 2020

Solution methods for linear bilevel problems

For some classes of problems, it is a strategy to form explicit KKT optimality conditions and convert the resulting complementarity conditions to big-M constraints. In case the original problem was linear (or in some cases quadratic), the resulting model is a MIP model. One such application is solving non-convex QP models [1]. Another application is a bilevel LP, where we reformulate the inner problem by stating the optimality conditions for the inner-problem. In [2], such a problem is given: 



bilevel LP
\[\begin{align}\max\> &\color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2] \\ & \min \>\color{darkred}y\\ & \qquad \color{darkred}y \ge 0 \perp \color{darkred} \lambda_1 \\ & \qquad \color{darkred}x - 0.01\color{darkred}y \le 1 \perp \color{darkred}\lambda_2 \end{align} \]

The notation \(\perp\) indicates the name of the dual variable for the inner constraints. After the inner problem is converted to optimality conditions, we have:

Inner LP as optimality conditions
\[\begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \cdot \color{darkred}y = 0 \\ & \color{darkred}\lambda_2 \cdot \left( \color{darkred}x -0.01\color{darkred}y - 1 \right) = 0 \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \end{align} \]

The products can be interpreted as an "or" condition. These can be linearized using binary variables and big-M constraints:

Linearized problem
\[\begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \le \color{darkblue} M_1^d \color{darkred}\delta_1 \\ & \color{darkred}y \le \color{darkblue} M_1^p (1-\color{darkred}\delta_1) \\ & \color{darkred}\lambda_2 \le \color{darkblue} M_2^d \color{darkred}\delta_2 \\ & -\color{darkred}x +0.01\color{darkred}y + 1 \le \color{darkblue} M_2^p (1-\color{darkred}\delta_2) \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \\ & \color{darkred}\delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align} \]

The letters \(p\) and \(d\) in the big-M constants indicate primal and dual.  Note that we flipped the sign of the second constraint of the inner problem. This was to make \(\color{darkred}\lambda_2\) a non-negative variable.

Choosing big-M values


One big issue is: choose appropriate values for the big-M constants. The values \(\color{darkblue} M_1^d,\color{darkblue} M_1^p,\color{darkblue} M_2^d,\color{darkblue} M_2^p = 200\) are valid bounds on \(\color{darkred}\lambda_1, \color{darkred}y, \color{darkred}\lambda_2, -\color{darkred}x +0.01\color{darkred}y + 1\). This gives the solution:

                           LOWER          LEVEL          UPPER

---- VAR x                   .             2.0000         2.0000      
---- VAR y                   .           100.0000        +INF         
---- VAR lambda1             .              .            +INF         
---- VAR lambda2             .           100.0000        +INF         
---- VAR delta1              .              .             1.0000      
---- VAR delta2              .             1.0000         1.0000      
---- VAR z                 -INF          102.0000        +INF         


The problem of choosing the right big M values is important. Choosing values that are too small, can lead to cutting off optimal values. Big-M values that are too large have other problems such as numerical issues and trickle flow. In [2] it is argued that many modelers use the following algorithm to detect if a big-M value is too small:

  1. Select initial values for \( \color{darkblue} M_j^d,  \color{darkblue} M_j^p \).
  2. Solve the linearized MIP model.
  3. Look for binding big-M constraints that should be inactivated. If not found, stop: we have found the optimal solution.
  4. Increase the corresponding big-M value.
  5. Go to step 2.

Unfortunately, this is algorithm is not guaranteed to work. [2] gives an example: use as starting values \(\color{darkblue} M_j^d = 50, \color{darkblue} M_j^p = 200\). The solution looks like:


                           LOWER          LEVEL          UPPER

---- VAR x                   .             2.0000         2.0000      
---- VAR y                   .              .            +INF         
---- VAR lambda1             .             1.0000        +INF         
---- VAR lambda2             .              .            +INF         
---- VAR delta1              .             1.0000         1.0000      
---- VAR delta2              .             1.0000         1.0000      
---- VAR z                 -INF            2.0000        +INF         

The values \(\color{darkred}\delta_j=1\) indicate we should look at the constraints: \(\color{darkred}\lambda_j \le \color{darkblue}M_j^d\). As \(\color{darkred}\lambda_1=1, \color{darkred}\lambda_2=0\) these are not binding. Hence, the erroneous conclusion is that this solution is optimal. 

Our conclusion is: we cannot reliably detect that big-M values are chosen too small by searching for binding constraints.

Alternatives


In [3], it is argued that SOS1 (special ordered sets of type 1) variables should be used instead of binary variables. For our example, this means we would replace our big-M constraints by: 


SOS1 representation
\[\begin{align} & \color{darkred}\lambda_1, \color{darkred}y \in SOS1 \\ & \color{darkred}\lambda_2, \color{darkred}s \in SOS1 \\ &\color{darkred}s = -\color{darkred}x+0.01\color{darkred}y+1 \\ & \color{darkred}s \ge 0 \end{align} \]


Many MIP solvers (but not all) support SOS1 variables. Binary variables can sometimes be faster, but SOS1 variables have the advantage that no big-M constants are needed. The same can be said for indicator constraints. A formulation using indicator constraints can look like:
 

Indicator constraints
\[\begin{align} & \color{darkred} \delta_1 = 0 \Rightarrow \color{darkred}\lambda_1 =0 \\ & \color{darkred} \delta_1 = 1\Rightarrow \color{darkred}y = 0 \\ & \color{darkred} \delta_2 = 0 \Rightarrow \color{darkred}\lambda_2 = 0 \\ & \color{darkred} \delta_2 = 1 \Rightarrow \color{darkred}x-0.01\color{darkred}y = 1 \\ & \color{darkred}x-0.01\color{darkred}y \le 1 \\ & \color{darkred} \delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align} \]


We can also solve the quadratic model directly using global solvers like Antigone, Baron, Couenne, or Gurobi. Sometimes reasonable bounds are needed to help globals solvers.

It is noted that the GAMS EMP tool can generate the quadratic model for you. I still had to specify to use a global solver for it to find the optimal solution.

Conclusion


Instead of using big-M constraints to handle complementarity conditions in linear bi-level optimization problems, there are a few alternatives:
  • SOS1 variables
  • Indicator constraints
  • Global solvers for non-convex quadratic problems

References


  1. Solving non-convex QP problems as a MIP,  https://yetanothermathprogrammingconsultant.blogspot.com/2016/06/solving-non-convex-qp-problems-as-mip.html 
  2. Salvador Pineda and Juan Miguel Morales, Solving Linear Bilevel Problems Using Big-Ms: Not All That Glitters Is Gold, September 2018,  https://arxiv.org/pdf/1809.10448.pdf
  3. Thomas Kleinert and Martin Schmidt, Why there is no need to use a big-M in linear bilevel optimization: A computational study of two ready-to-use approaches, October 2020,  http://www.optimization-online.org/DB_FILE/2020/10/8065.pdf 

Appendix: GAMS Code


$ontext

  
different formulations of a bilevel optimization problem.

  
model 1. high level model solved as EMP model

  
max x + y
  
st
      
x in [0,2]
      
min y
      
st  y >= 0        : lambda1 >= 0
          
x-0.01y <= 1  : lambda2 >= 0

  
Note: in subsequent model we flip the last constraint to make the sign of the dual >= 0

  
This model can be solved using EMP + a global solver.

$offtext

option optcr=0;

positive variables x,y;
x.up = 2;

variables
   z
'outer objective'
   zin
'inner objective'
;

equations
   obj    
'outer objective'
   objin  
'inner objective'
   cin1    
'inner constraint 1'
   cin2    
'inner constraint 2'
;
obj..     z =e= x+y;
objin..   zin =e= y;
cin1..    y =g= 0;
cin2..    x-0.01*y =l= 1;

model m1 /obj, objin, cin1, cin2/;

$onecho > "%emp.info%"
bilevel x
min zin y objin cin1 cin2
$offecho

$onecho > jams.opt
subsolver couenne
$offecho

m1.optfile=1;
solve m1 using EMP maximizing z;


$ontext

  
model 2. Optimality conditions for inner problem.
     
This is a non-convex quadratic model.
      
This can be solved with Gurobi, Couenne, Baron, Antigone.

  
max x + y
  
st
      
x in [0,2]
      
1 - lambda1 - 0.01*lambda2 = 0
      
lambda1*y = 0
      
lambda2*(-x+0.01y+1) = 0
      
lambda1,lambda2>=0

$offtext

positive variables lambda1,lambda2;

equations
   dLdy   
'derivative of lagrangean'
   quad1  
'complementarity constraint (quadratic)'
   quad2  
'complementarity constraint (quadratic)'
;

dLdy..  1 - lambda1 - 0.01*lambda2 =e= 0;
quad1.. lambda1*y =e= 0;
quad2.. lambda2*(x-0.01*y-1) =e= 0;

model m2 /obj,dLdy,cin2,quad1,quad2/;
option minlp = couenne;
solve m2 using minlp maximizing z;

$ontext

  
model 3. Linearized version of model 2 using big-M constraints

  
max x + y
  
st
      
x in [0,2]
      
1 - lambda1 - 0.01*lambda2 = 0
      
lambda1 <= delta1*M1d
      
y <= (1-delta1)*M1p
      
lambda2 <= delta2*M2d
      
-x+0.01y+1 <= (1-delta2)*M2p
      
lambda1,lambda2>=0
      
delta1, delta2 binary

$offtext


binary variable delta1, delta2;

equations
   comp1d 
'linearized complementarity constraint (dual)'
   comp1p 
'linearized complementarity constraint (primal)'
   comp2d 
'linearized complementarity constraint (dual)'
   comp2p 
'linearized complementarity constraint (primal)'
;


scalars
   M1p 
'big-M for primal' /200/
   M1d 
'big-M for dual'   /200/
   M2p 
'big-M for primal' /200/
   M2d 
'big-M for dual'   /200/
;

comp1d.. lambda1 =l= delta1*M1d;
comp1p.. y =l= (1-delta1)*M1p;
comp2d.. lambda2 =l= delta2*M2d;
comp2p.. -x+0.01*y+1 =l= (1-delta2)*M2p;

model m3 /obj,dLdy,comp1d,comp1p,comp2d,comp2p/;
solve m3 using mip maximizing z;


$ontext

  
model 4. Linearized version of model 2 using SOS1 variables

  
max x + y
  
st
      
x in [0,2]
      
1 - lambda1 - 0.01*lambda2 = 0
      
s = -x+0.01y+1
      
lambda1,lambda2,s >=0
      
lambda1, y forms a SOS1 set
      
lambda2, s forms a SOS1 set

   
In GAMS we need to reformulate this a bit as SOS1 variables are indexed.
   
No need for s in the GAMS model.

$offtext

set k 'SOS members' /k1,k2/;
sos1 variable
   sos_1(k) 
'for the pair lambda1, y'
   sos_2(k) 
'for the pair lambda2, -x+0.01y+1'
;

equations
  sos_1a 
'assign to sos1 member'
  sos_1b 
'assign to sos1 member'
  sos_2a 
'assign to sos1 member'
  sos_2b 
'assign to sos1 member'
;


sos_1a.. y =e= sos_1(
'k1');
sos_1b.. lambda1 =e= sos_1(
'k2');

sos_2a.. -x+0.01*y+1 =e= sos_2(
'k1');
sos_2b.. lambda2 =e= sos_2(
'k2');

model m4 /obj,dLdy,sos_1a,sos_1b, sos_2a, sos_2b/;
solve m4 using mip maximizing z;

$ontext

  
model 5. Linearized version of model 2 using indicator constraints
  
this is very ugly in GAMS

  
max x + y
  
st
      
x in [0,2]
      
1 - lambda1 - 0.01*lambda2 = 0
      
x - 0.01y <= 1
      
d1 = 0 ==> lambda1 = 0
      
d1 = 1 ==> y = 0
      
d2 = 0 ==> lambda2 = 0
      
d2 = 1 ==> x - 0.01y = 1
      
lambda1, lambda2 >= 0
      
d1, d2 in {0,1}
      
y >= 0


$offtext

binary variables
   d1  
'lambda1 = 0 OR y = 0'
   d2  
'lambda2 = 0 OR x - 0.01y = 1'
;

equations
   d10
'part of indicator constraint'
   d11
'part of indicator constraint'
   d20
'part of indicator constraint'
   d21
'part of indicator constraint'
   dummy
'make sure d1,d2 are part of the model'
;
d10.. lambda1 =e= 0;
d11.. y =e= 0;
d20.. lambda2 =e= 0;
d21.. x-0.01*y =e= 1;
dummy.. d1+d2 =g= 0;

$onecho > cplex.opt
indic d10$d1 0
indic d11$d1 1
indic d20$d2 0
indic d21$d2 1
$offecho

model m5 /obj,dLdy,d10,d11,d20,d21,cin2,dummy/;
m5.optfile = 1;
solve m5 using mip maximizing z;



No comments:

Post a Comment