Sunday, February 17, 2019

Piecewise linear functions and formulations for interpolation (part 1)

Pyomo has a tool to use different formulations for piecewise linear functions. From the help:

>>> from pyomo.core import *
>>> help(Piecewise)
Help on class Piecewise in module pyomo.core.base.piecewise:

class Piecewise(pyomo.core.base.block.Block)
 |  Piecewise(*args, **kwds)
 |
 |      Adds piecewise constraints to a Pyomo model for functions of the
 |      form, y = f(x).
 |
 |      Usage:
 |              model.const = Piecewise(index_1,...,index_n,yvar,xvar,**Keywords)
 |              model.const = Piecewise(yvar,xvar,**Keywords)
 |
 |      Keywords:
 |
 |  -pw_pts={},[],()
 |            A dictionary of lists (keys are index set) or a single list
 |            (for the non-indexed case or when an identical set of
 |            breakpoints is used across all indices) defining the set of
 |            domain breakpoints for the piecewise linear
 |            function. **ALWAYS REQUIRED**
 |
 |  -pw_repn=''
 |            Indicates the type of piecewise representation to use. This
 |            can have a major impact on solver performance.
 |            Choices: (Default 'SOS2')
 |
 |               ~ + 'SOS2'      - Standard representation using sos2 constraints
 |               ~   'BIGM_BIN'  - BigM constraints with binary variables.
 |                                 Theoretically tightest M values are automatically
 |                                 determined.
 |               ~   'BIGM_SOS1' - BigM constraints with sos1 variables.
 |                                 Theoretically tightest M values are automatically
 |                                 determined.
 |               ~*+ 'DCC'       - Disaggregated convex combination model
 |               ~*+ 'DLOG'      - Logarithmic disaggregated convex combination model
 |               ~*+ 'CC'        - Convex combination model
 |               ~*+ 'LOG'       - Logarithmic branching convex combination
 |               ~*  'MC'        - Multiple choice model
 |               ~*+ 'INC'       - Incremental (delta) method
 |
 |             + Supports step functions
 |             * Source: "Mixed-Integer Models for Non-separable Piecewise Linear
 |                        Optimization: Unifying framework and Extensions" (Vielma,
 |                        Nemhauser 2008)
 |             ~ Refer to the optional 'force_pw' keyword.
 |
 |  -pw_constr_type=''
 |            Indicates the bound type of the piecewise function.
 |            Choices:
-- More  --

Some of these representations are not immediately obvious. Let's try to get a intuitive understanding of them. For a more formal discussion see [1].

I just consider the case I am using most often: a one-dimensional piecewise function \(y = f(x)\). Note that in the context of an optimization model we need not necessarily to interpret this as: given an \(x\), calculate an \(y\),  but rather as part of a system of equations. I.e. \(x\) and \(y\) are determined at same time (if \(y\) is known, we find \(x\), or even: we find \(x,y\) simultaneously).

The  piecewise linear function is defined by its breakpoints:


Some observations:

  1. We have \(K\) breakpoints \((\bar{x}_k,\bar{y}_k)\) and \(K-1\) segments,
  2. we assume we have lower and upper bounds on \(x\) and \(y\),
  3. we assume the curve is connected (i.e. no forbidden regions for \(x\), these have to be handled separately,
  4. we allow step functions by adding a vertical segment (not all formulation allow this).

Finally, we note that interpolating between two points \((\bar{x}_1,\bar{y}_1)\), \((\bar{x}_2,\bar{y}_2)\) can be stated as choosing two weights \(\lambda_1, \lambda_2\) with: \[\begin{align} & x = \lambda_1 \bar{x}_1 + \lambda_2 \bar{x}_2\\ & y = \lambda_1 \bar{y}_1 + \lambda_2 \bar{y}_2 \\ &  \lambda_1+\lambda_2 = 1 \\ & \lambda_1, \lambda_2 \ge 0 \end{align}\]



Method 1: SOS2 formulation


This one is the easiest to remember. Many more advanced MIP solvers have facilities for SOS2 variables. The definition is: 

Let \(x_1,\dots,x_n\) be members of a Special Ordered Set of Type 2 (SOS2), then only two variables \(x_i\) can be nonzero and they have to be neighbors. All other variables are zero.
This looks like a strange animal, but in fact this construct is especially designed for cases like our piecewise linear interpolation scheme. Using SOS2 constraints, we can easily formulate a model: 


SOS2 Formulation
\[\begin{align} & \color{DarkRed}x = \sum_k \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_k \\ & \color{DarkRed}y = \sum_k \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_k\\& \sum_k \color{DarkRed} \lambda_k = 1 \\ & \color{DarkRed}\lambda_k \ge 0 \\ & \mathit{SOS2}(\color{DarkRed}\lambda_1,\dots,\color{DarkRed}\lambda_K) \end{align}\]


In the solution, two adjacent \(\lambda_s,\lambda_{s+1}\) will be between 0 and 1, and for these we have \(\lambda_s+\lambda_{s+1}=1\). All the others will be be zero. I.e. we will be interpolating between the two corresponding breakpoints \((\bar{x}_s, \bar{y}_s)\) and \((\bar{x}_{s+1}, \bar{y}_{s+1})\).  This SOS2 formulation does two things in one swoop: it selects the segment \(s\) the variables \(x\) and \(y\) belong to, and it interpolates between the two selected breakpoints.

Notes:

  • This method is used quite a lot in practice, not in the least because it makes life easy for the modeler.
  • This approach handles step functions without a problem.
  • There are no issues with big-M constants.
  • However, binary variables may offer performance benefits. 


Method 2: Big-M Binary Formulation


In this formulation we turn on or off each linear equality depending if we select a segment. Looking at our simple example, we can state this as: \[y = \begin{cases} -2x+8 & x \in [1,3]\\ 2x-4 & x \in [3,6]\\ -0.25x+9.5 & x \in [6,10]\end{cases}\] We can model with with binary variables as follows: define \[\delta_s = \begin{cases} 1 & \text{if segment $s$ is selected}\\ 0 & \text{otherwise}\end{cases}\] When \(\delta_s=0\) for a given segment \(s\), we need to make the corresponding equation non-binding, and also the corresponding limits on \(x\). This can be accomplished using inequalities and some big-M's. As an example, consider the second segment. We can write: \[\begin{align} &   2x-4-(1-\delta_2)M\le y \le  2x-4 +(1-\delta_2)M \\ & 3 - (1-\delta_2)M \le x \le 6+ (1-\delta_2)M\end{align}\] Finding good values for each \(M\) is a bit tedious. Basically we have to choose the smallest \(M\) such that each breakpoint is reachable (i.e. not cut off). For the constraint \[ y \le  2x-4 +(1-\delta_2)M\] we can write: \[ \begin{align} M & = \max_k \{\bar{y}_k - 2\bar{x}_k+4\}\\ & = \max \{ 6-2\cdot 1+4,2-2\cdot 3 + 4, 8-2\cdot 6 +4 , 7-2 \cdot 10+4 \} \\ & = \max \{8,0,0,-9\}\\&=8\end{align} \] The zero values correspond to breakpoints on the current segment.

A high level description can look like: \[\begin{align} &\delta_s = 1 \Rightarrow \begin{cases} y = a_s x + b_s \\  \bar{x}_s \le x \le \bar{x}_{s+1} \end{cases}\\ &\sum_s \delta_s = 1\\ &\delta_s \in \{0,1\}\end{align} \] where \(a_s, b_s\) are the coefficient for the linear function in segment \(s\). These coefficients can be calculated easily from the breakpoints \((\bar{x}_s,\bar{y}_s)\) and \((\bar{x}_{s+1},\bar{y}_{s+1})\). This implication can be implemented as follows: 


Big-M Binary Formulation
\[\begin{align} & \color{DarkBlue}a_s \color{DarkRed}x + \color{DarkBlue}b_s - (1-\color{DarkRed}\delta_s) \color{DarkBlue} M \le \color{DarkRed}y \le \color{DarkBlue}a_s \color{DarkRed}x + \color{DarkBlue}b_s + (1-\color{DarkRed}\delta_s) \color{DarkBlue} M \\ & \color{DarkBlue}{\bar{x}}_s - (1-\color{DarkRed}\delta_s)\color{DarkBlue}M \le \color{DarkRed}x \le \color{DarkBlue}{\bar{x}}_{s+1} + (1-\color{DarkRed}\delta_s)\color{DarkBlue}M \\ & \sum_s \color{DarkRed} \delta_s=1\\ & \color{DarkRed}\delta_s \in \{0,1\}\\ & \color{DarkBlue}a_s = \frac{\color{DarkBlue}{\bar{y}}_{s+1}-\color{DarkBlue}{\bar{y}}_s}{\color{DarkBlue}{\bar{x}}_{s+1}-\color{DarkBlue}{\bar{x}}_s} \\ &\color{DarkBlue}b_s = \color{DarkBlue}{\bar{y}}_s - \color{DarkBlue} a_s \color{DarkBlue}{\bar{x}}_s \end{align}\]


This looks a bit more intimidating, but it is conceptually simple. Unfortunately we see some problems with the Pyomo implementation of this formulation.

Notes:

  • We need 4 inequalities per segment, and one binary variable.
  • This method assumes we can calculate a slope \(a_s\). For step functions we have an infinite slope when there is a vertical segment. I.e. we cannot do step functions using this method.
  • It is imperative to choose reasonable values for the big-M's.
  • In the model above I just used one \(M\), but actually each of them can have a different value.
  • This is not a formulation I typically use: it is somewhat cumbersome to setup. 
  • Pyomo implements this method incorrectly, and we will get wrong results.

Method 3: Big-M SOS1 formulation


This is just a minor variant on the previous method. Instead of using binary variables, we use continuous variables that are members of a Special Ordered Set of Type 1. SOS1 sets are defined by:

Let \(x_1,\dots,x_n\) be members of a Special Ordered Set of Type 1 (SOS1), then only one variable \(x_i\) can be nonzero. All other variables are zero.

Basically we replace the binary variables: \[\begin{align} &\sum_s \delta_s = 1\\ & \delta_s \in \{0,1\}\end{align}\] by \[\begin{align} &\sum_s \lambda_s = 1 &\\ & \lambda_s \ge 0 \\ & \mathit{SOS1}(\lambda_1,\dots,\lambda_{K-1})\end{align}\] This does not change the model a lot:  


Big-M SOS1 Formulation
\[\begin{align} & \color{DarkBlue}a_s \color{DarkRed}x + \color{DarkBlue}b_s - (1-\color{DarkRed}\lambda_s) \color{DarkBlue} M \le \color{DarkRed}y \le \color{DarkBlue}a_s \color{DarkRed}x + \color{DarkBlue}b_s + (1-\color{DarkRed}\lambda_s) \color{DarkBlue} M \\ & \color{DarkBlue}{\bar{x}}_s - (1-\color{DarkRed}\lambda_s)\color{DarkBlue}M \le \color{DarkRed}x \le \color{DarkBlue}{\bar{x}}_{s+1} + (1-\color{DarkRed}\delta_s)\color{DarkBlue}M \\ & \sum_s \color{DarkRed} \lambda_s=1\\ & \color{DarkRed}\lambda_s\ge 0\\& \mathit{SOS1}(\color{DarkRed}\lambda_1,\dots\,\color{DarkRed}\lambda_{K-1})\\ & \color{DarkBlue}a_s = \frac{\color{DarkBlue}{\bar{y}}_{s+1}-\color{DarkBlue}{\bar{y}}_s}{\color{DarkBlue}{\bar{x}}_{s+1}-\color{DarkBlue}{\bar{x}}_s} \\ &\color{DarkBlue}b_s = \color{DarkBlue}{\bar{y}}_s - \color{DarkBlue} a_s \color{DarkBlue}{\bar{x}}_s \end{align}\]

Notes:

  • We can use the SOS1 variables to prevent the need for big-M's. We can write something like \[\begin{align} & a_s x+b_s -\lambda_s \le y \le a_s x+b_s +\lambda_s\\ & \bar{x}_s - \lambda_s \le x \le \bar{x}_{s+1} + \lambda_s \\ & \sum_s \delta_s = 1\\& \lambda_s \ge 0\\& \delta_s \in \{0,1\} \\ & \mathit{SOS1}(\lambda_s,\delta_s) \end{align} \]
  • This method shows the same Pyomo bug as the big-M binary variable model: it is incorrectly implemented.
  • If a solver support SOS1 variables, it also will typically support SOS2 variables. I.e. the SOS2 approach may be easier.
  • This method is supposedly marked for removal in future versions of Pyomo. 

A nasty Pyomo bug


To test the bigM-bin method on our simple data set I formulated the model: \[\begin{align}\max \> & y \\ & \text{piecewise on our data} \\ & x=5 \\ & x\in [1,10]\end{align}\] We expect to see as solution: \(x=5, y=6\) and an objective of 6. Instead we see:

D:\Python\Python37\Scripts>pyomo solve --solver=glpk bigm.py
[    0.00] Setting up Pyomo environment
[    0.00] Applying Pyomo preprocessing actions
WARNING: DEPRECATED: The 'BIGM_BIN' and 'BIGM_SOS1' piecewise representations
    will be removed in a future version of Pyomo. They produce incorrect
    results in certain cases
[    0.60] Creating model
[    0.60] Applying solver
[    0.83] Processing results
    Number of solutions: 1
    Solution Information
      Gap: 0.0
      Status: optimal
      Function Value: 8.25
    Solver results file: results.json
[    0.86] Applying Pyomo postprocessing actions
[    0.86] Pyomo Finished

D:\Python\Python37\Scripts>

Well, this is disappointing. We get a wrong solution!

We also get a warning that the Pyomo people are aware of problems with this method.

This is more wrong than what can be explained away with some numerical issues related to large big-M values. Large big-M values can lead to wrong results. I don't think that is the case here, but let's investigate.

The details of the Pyomo error


This section may become a bit dreary: we will dive into the details of this bug. Not for the faint of heart.

The model looks like:


#
# expected solution X=5, Y=8
#

xdata = [1., 3., 6., 10.]
ydata  = [6.,2.,8.,7.]

from pyomo.core import *

model = ConcreteModel()

model.X = Var(bounds=(1,10))
model.Y = Var()

model.con = Piecewise(model.Y,model.X,
                      pw_pts=xdata,
                      pw_constr_type='EQ',
                      f_rule=ydata,
                      pw_repn='BIGM_BIN')

# see what we get for Y when X=5
def con2_rule(model):
    return model.X==5

model.con2 = Constraint(rule=con2_rule)

model.obj = Objective(expr=model.Y, sense=maximize)

The solution file is:


{
    "Problem": [
        {
            "Lower bound": 8.25,
            "Name": "unknown",
            "Number of constraints": 9,
            "Number of nonzeros": 21,
            "Number of objectives": 1,
            "Number of variables": 6,
            "Sense": "maximize",
            "Upper bound": 8.25
        }
    ],
    "Solution": [
        {
            "number of solutions": 1,
            "number of solutions displayed": 1
        },
        {
            "Constraint": "No values",
            "Gap": 0.0,
            "Message": null,
            "Objective": {
                "obj": {
                    "Value": 8.25
                }
            },
            "Problem": {},
            "Status": "optimal",
            "Variable": {
                "X": {
                    "Value": 5.0
                },
                "Y": {
                    "Value": 8.25
                },
                "con.bin_y[3]": {
                    "Value": 1.0
                }
            }
        }
    ],
    "Solver": [
        {
            "Error rc": 0,
            "Statistics": {
                "Branch and bound": {
                    "Number of bounded subproblems": "1",
                    "Number of created subproblems": "1"
                }
            },
            "Status": "ok",
            "Termination condition": "optimal",
            "Time": 0.12992453575134277
        }
    ]
}

Clearly the Y value is wrong. To debug this, we can generate and inspect the LP file. This is a bit hard to read:


\* Source Pyomo model name=unknown *\

max 
obj:
+1 Y

s.t.

c_e_con2_:
+1 X
= 5

c_u_con_BIGM_constraint1(1)_:
+2 X
+1 Y
+19 con_bin_y(1)
<= 27

c_u_con_BIGM_constraint1(2)_:
-2 X
+1 Y
+8 con_bin_y(2)
<= 4

c_u_con_BIGM_constraint1(3)_:
+0.25 X
+1 Y
<= 9.5

c_e_con_BIGM_constraint2_:
+1 con_bin_y(1)
+1 con_bin_y(2)
+1 con_bin_y(3)
= 1

c_l_con_BIGM_constraint3(1)_:
+2 X
+1 Y
>= 8

c_u_con_BIGM_constraint3(2)_:
+2 X
-1 Y
+9 con_bin_y(2)
<= 13

c_u_con_BIGM_constraint3(3)_:
-0.25 X
-1 Y
+6.75 con_bin_y(3)
<= -2.75

c_e_ONE_VAR_CONSTANT: 
ONE_VAR_CONSTANT = 1.0

bounds
   1 <= X <= 10
    -inf <= Y <= +inf
   0 <= con_bin_y(1) <= 1
   0 <= con_bin_y(2) <= 1
   0 <= con_bin_y(3) <= 1
binary
  con_bin_y(1)
  con_bin_y(2)
  con_bin_y(3)
end


Let's compare carefully what we expect and what we see:

ConstraintExpectedGenerated by PyomoNotes
Objective\[\max\>y \]
max obj: +1 Y
Segment 1\[ y \le -2x+8 +(1-\delta_1)M \]
+2X+1Y+19con_bin_y(1)<=27
\(M=19\)
\[ y \ge -2x+8 -(1-\delta_1)M \]
+2X+1Y>=8
\(M=0\)
\[ x \ge 1 -(1-\delta_1)M \] This one is not needed.
\[ x \le 3 + (1-\delta_1)M \] \(M=7\)
Segment 2\[ y \le 2x-4 +(1-\delta_2)M \]
-2X+1Y+8con_bin_y(2)<=4
\(M=8\)
\[ y \ge 2x-4 -(1-\delta_2)M \]
+2X-1Y+9con_bin_y(2)<=13
\(M=9\)
\[ x \ge 3 -(1-\delta_2)M \] \(M=2\)
\[ x \le 6 + (1-\delta_2)M \] \(M=4\)
Segment 3\[ y \le -0.25x+9.5 +(1-\delta_3)M \]
+0.25X+1Y<=9.5
\(M=0\)
\[ y \ge -0.25x+9.5 -(1-\delta_3)M \]
0.25X-1Y+6.75con_bin_y(3)<=-2.75
\(M=6.75\)
\[ x \ge 6 -(1-\delta_3)M \] \(M=2\)
\[ x \le 10 + (1-\delta_3)M \] This one is not needed.
Sum\[\sum_s \delta_s = 1\]
+1con_bin_y(1)
+1con_bin_y(2)
+1con_bin_y(3)=1
Fix\[x=5\]
+1X=5

We see we have missing inequalities relating \(x\) to the segments. The cells marked in yellow indicate the missing constraints. No wonder we got the wrong results. If we look again at the Pyomo solution we see: \(x=5\) but also con.bin_y[3]=1. This is indeed an indication of the problem. The third segment is selected, but \(x=5\) is outside this segment.

We cannot blame the big-M's for this problem. The Pyomo piecewise function is just forgetting to include a number of inequalities. The implication \[\delta_s = 1 \Rightarrow \bar{x}_s \le x \le \bar{x}_{s+1}\] is really important for this formulation, and ignoring it leads to erroneous results.

References


  1. Juan Pablo Vielma, Shabbir Ahmed and George Nemhauser, Mixed-Integer Models for Nonseparable Piecewise Linear Optimization: Unifying Framework and Extensions, Operations Research 58(2):303-315, 2010

Tuesday, February 12, 2019

The 8-queens problem without binary variables

An eight-queens solution [3]


The 8-queens problem can be stated as:

Place \(n=8\) queens on an \(n \times n\) board such that none of the queens is attacked by another. 
This is a feasibility problem, and can be formulated as a MIP problem [1]:


n-Queens Problem
\[\begin{align}& \sum_j \color{DarkRed}x_{i,j} = 1 && \forall i\\& \sum_i \color{DarkRed}x_{i,j} = 1 && \forall j\\ & \sum_{i-j=k} \color{DarkRed}x_{i,j} \le 1 && k=-(n-2),\dots,(n-2)\\& \sum_{i+j=k} \color{DarkRed}x_{i,j} \le 1 && k=3,\dots,2n-1\\ & \color{DarkRed}x_{i,j}\in\{0,1\} \end{align}\]

Basically: place a single queen on each row and column such that each they don't attack each other diagonally. We need the conditions \(x_{i,j}\in \{0,1\}\) to make sure the variables only assume the values zero or one. If we would drop this requirement, and only assume \(x_{i,j}\in [0,1]\) (i.e. they have lower- and upper-bounds of 0 and 1), we may see a fractional solution like:


----     39 VARIABLE x.L  

            c1          c2          c3          c4          c5          c6          c7          c8

r1                                                                               1.000
r2                               0.200       0.800
r3       0.600       0.300                   0.100
r4                                                       0.600                               0.400
r5       0.200       0.700                   0.100
r6       0.200                                           0.400       0.400
r7                               0.800                               0.200
r8                                                                   0.400                   0.600


Nonlinear objective


We can invent a non-linear objective, to make sure the variables \(x_{i,j}\) assume only values zero or one in the optimal solution: \[\max \sum_{i,j} x^2_{i,j}\] The idea is that \[0.2^2+0.8^2=0.68 < 0^2+1^2 = 1\] So we want to solve:


n-Queens Problem as continuous NLP
\[\begin{align} \max & \sum_{i,j} \color{DarkRed}x^2_{i,j}\\ & \sum_j \color{DarkRed}x_{i,j} = 1 && \forall i\\& \sum_i \color{DarkRed}x_{i,j} = 1 && \forall j\\ & \sum_{i-j=k} \color{DarkRed}x_{i,j} \le 1 && k=-(n-2),\dots,(n-2)\\& \sum_{i+j=k} \color{DarkRed}x_{i,j} \le 1 && k=3,\dots,2n-1\\ & \color{DarkRed}x_{i,j}\in [0,1] \end{align}\]

This looks very ingenious. Unfortunately, the problem is non-convex, a complication that was forgotten in [2]. Solving a non-convex NLP problem like this to optimality will require a global solver. Indeed when solved with the solver BARON, I see:


----     39 VARIABLE x.L  

            c1          c2          c3          c4          c5          c6          c7          c8

r1                                                       1.000
r2                   1.000
r3                                                                                           1.000
r4       1.000
r5                                           1.000
r6                                                                               1.000
r7                               1.000
r8                                                                   1.000


Altogether, I believe this is not a very successful way to circumvent using binary variables. We make the problem not any easier by making it non-linear and non-convex.


Nonlinear constraint


Another way to replace binary variables by a continuous construct is to impose a constraint \[x(1-x)=0\] This also something I would not recommend. This constraint is also non-convex and needs a global solver unless you have a very good starting point (or unless you are extremely lucky).

Conclusion


There are ways to formulate models with binary variables as continuous models using clever objectives or constraints. These tricks make the model not any easier: they introduce nasty non-convexities. It is better to use explicit binary variables, not in the least to make clear we are dealing with a combinatorial problem.

References


  1. Chess and solution pool, http://yetanothermathprogrammingconsultant.blogspot.com/2018/11/chess-and-solution-pool.html discusses \(n\)-queens and related problems, and tries the solution pool to enumerate all possible solutions.
  2. How to solve the 8 queens problem  with CVXPY? https://stackoverflow.com/questions/54652279/how-to-solve-the-8-queens-problem-with-cvxpy/54660353 proposes a novel way to prevent binary variables.
  3. Eight queens puzzle, https://en.wikipedia.org/wiki/Eight_queens_puzzle

Tuesday, February 5, 2019

Logs made linear


Logarithm table

Nonlinear model


In [1] a question is asked about a nonlinear model:

How to solve:


Nonlinear model
\[\begin{align} \min & \sum_i \color{DarkRed} x_i\\ & \sum_i \color{DarkBlue} a_i \log \color{DarkRed} x_i \ge \color{DarkBlue} c \\ & \color{DarkRed}x_i \in \{\color{DarkBlue}b_1,\color{DarkBlue}b_2,\dots,\color{DarkBlue}b_m\} \end{align} \]

where we have \(n\) variables \(x_i\), \(i=1,\dots,n\). This model looks nonlinear because of the logarithms.

Linear model


The condition \(x_i \in \{b_1,b_2,\dots,b_m \} \) is essentially a table lookup. This can be modeled using binary variables: \[y_{i,j} = \begin{cases} 1 & \text{if $x_i=b_j$}\\ 0 & \text{otherwise}\end{cases}\] With this definition we can write \[ \begin{align} & x_i = \sum_j y_{i,j} b_j \\ & \sum_j y_{i,j}=1 && \forall i\\ & y_{i,j} \in \{0,1\} \end{align}\]

Maybe somewhat surprisingly, we can also calculate \(\log x_i\) using these \(y_{i,j}\): \[\mathit{logx}_i = \sum_j y_{i,j} \log(b_j)\] This is very fortunate! We get the logs essentially for free! So our complete model can look like:

Linear model
\[\begin{align} \min & \sum_i \color{DarkRed} x_i\\ & \color{DarkRed} x_i = \sum_j \color{DarkRed} y_{i,j} \color{DarkBlue} b_j \\ & \color{DarkRed}{\mathit{logx}}_i = \sum_j \color{DarkRed} y_{i,j} \log(\color{DarkBlue} b_j) \\ & \sum_j \color{DarkRed} y_{i,j}=1 && \forall i \\ & \sum_i \color{DarkBlue} a_i \color{DarkRed}{\mathit{logx}}_i \ge \color{DarkBlue} c \\ & \color{DarkRed}y_{i,j} \in \{0,1\} \end{align} \]

This model is now completely linear and we can use a standard Mixed Integer Programming (MIP) solver to solve it.

More compact models


If we want we can save a few variables and equations by substituting out the variable \(\mathit{logx}_i\).  It would make the model a bit more difficult to read: \[\sum_{i,j}  a_i \log(b_j) y_{i,j} \ge  c \] may not be completely obvious. Furthermore we can substitute out the variables \(x_i\) and just keep the \(y_{i,j}\) variables. Of course, you would need to recover, after the solve, the values for \(x_i\) from the optimal values \(y_{i,j}^*\). That model would be very compact, but would need some explanation, as it is not completely self-explanatory.

Polynomial constraints?


In [1] someone suggested to form the polynomial equations \[(x_i-b_1)(x_i-b_2)\cdots(x_i-b_m)=0\] and solve as a Nonlinear Programming (NLP) problem. This is surely not a good approach. If this was really a great idea, we would no longer need MIP solvers! Instead of binary variables, we just could form \[x(x-1)=0\] For an integer variable \(x \in \{0,1,\dots,n\}\) we could write \[x(x-1)(x-2)\cdots(x-n)=0\] The reason why this does not work, is that we create a very difficult non-convex problem. Unless you use a global solver, an NLP solver will just get stuck in a local optimum (or may not find a feasible solution). Don't throw away your MIP solver yet.

Final remarks


Using random data I was able to solve a problem with \(n=m=100\) in a few seconds. Of course this depends on the data: other instances may take more time.

This discussion reminds me a little bit of linear regression problems. In many cases, non-linearities appear in the data only, and although the regression model looks non-linear, it can actually be handled by linear regression approaches.

References


  1. How to solve the following optimization problem of mathematical programming?, https://math.stackexchange.com/questions/3090493/how-to-solve-the-following-optimization-problem-of-mathematical-programming
  2. MIP Modeling: from Sudoku to KenKen via Logarithms, https://yetanothermathprogrammingconsultant.blogspot.com/2016/10/mip-modeling-from-sudoku-to-kenken.html. The model to solve KenKen puzzles uses a similar approach.