## Tuesday, March 19, 2019

### Visualization of a large scheduling solution

One of the problems we face when developing larger scheduling problems is: how do we look at the solution? The solution data is too large to eye ball, and looking at tables with 0-1 decision variables is very difficult.

In the this model, the main variable is x which is a binary variable indicating if a person is assigned to a facility at time t. The GAMS listing file is not a very convenient way to look at this:

 ---- VAR x  assign provider to facility                                                  LOWER          LEVEL          UPPER         MARGINAL 5   .7  .Hospitalist          .04/01/2019          .              .             1.0000         3.0000      5   .7  .Hospitalist          .04/02/2019          .              .             1.0000         3.0000      5   .7  .Hospitalist          .04/03/2019          .              .             1.0000         3.0000      5   .7  .Hospitalist          .04/04/2019          .              .             1.0000         9.0000      5   .7  .Hospitalist          .04/05/2019          .              .             1.0000         9.0000      5   .7  .Hospitalist          .04/06/2019          .              .             1.0000         9.0000      5   .7  .Hospitalist          .04/07/2019          .              .             1.0000         9.0000      5   .7  .Hospitalist          .04/08/2019          .              .             1.0000         9.0000      5   .7  .Hospitalist          .04/09/2019          .              .             1.0000         9.0000      5   .7  .Hospitalist          .04/10/2019          .              .             1.0000         9.0000      5   .7  .Hospitalist          .04/11/2019          .              .             1.0000         3.0000      5   .7  .Hospitalist          .04/12/2019          .              .             1.0000         3.0000

I just showed a few records: there are more than 40,000 of them. Obviously, this is fairly useless for our application.

The GDX data viewer is showing by default:

 Default view
This is not much better. We can improve this a bit by pivoting: only look at the levels and display the date dimension as columns:

 Pivot table
These are still raw results. We see we have numeric id's instead of more meaningful names. The reason to use the id's is that they are guaranteed unique (they are keys).

To be able get a more compact and meaningful visualization of the results we need to post-process the results. E.g. we prefer names instead of id's. Here I show a visualization in Excel, where we can make create different views. Note: I have anonymized the names in this demo (in the real application we have real names to add immediate context).

 Excel based visualization

Not only is this effort useful for viewing and understanding the results (important for debugging and improving the optimization model), but it also allows me to communicate results with the client. Although I ship results in the form of CSV files (to be consumed by a database tool), this CSV file is difficult to interpret directly. This Excel spreadsheet allows us to look at our large, complex solution in much more efficient and effective way. Of course, I have automated things, so after an optimization run, I can generate this spreadsheet using a script. Well rather two scripts: one is written in GAMS and generates a large sparse data cube (with about 50k entries), the second one is written in VBA and does the coloring among other things.

The advantage of using Excel instead of using a dedicated tool, is that it makes it easier to share results. In the business world, practically everyone is familiar with Excel. Emailing spreadsheets around is much more accepted than sending say executables.

Although creating this tool is not exactly cheap, being able to inspect results in a meaningful way and communicate about them is a big pay off. It is not unusual these tools develop into a more important decision support tool than just a debugging aid for an optimization model.

As an aside: developing this optimization model turned out to be an excellent tool to improve the underlying database. Optimization models look at all data in context, instead of just looking at it record by record. This will put lots of emphasis on correct data, much more than traditional database applications. Data that initially looks quite decent, may actually contain more errors and inconsistencies than you would think. An optimization is a good tool to unearth these.

Developing a mathematical programming application is often much more than just doing mathematical programming.

## Thursday, March 14, 2019

### Piecewise linear functions and formulations for interpolation (part 3)

In previous posts I discussed several Mixed Integer Programming formulations for piecewise linear functions [1,2]. In this final third part, I want to demonstrate some newer formulations, using a logarithmic number of binary variables [6]. Finally we observe some problems with existing implementations in Pyomo.

In part 1 [1] you can find discussion and formulations for:

1. SOS2: use SOS2 variables to model the piecewise linear functions. This is an easy modeling exercise.
2. BIGM_BIN: use binary variables and big-M constraints to enable and disable constraints. This is a more complex undertaking, especially if we want to use the smallest possible values for the big-M constants. Pyomo has bug in this version.
3. BIGM_SOS1: a slight variation of the BIGM_BIN model. Pyomo has the same problem with this model.
Part 2 [2] contains the following formulations:

1. DCC: Disaggregated Convex Combination formulation is a simulation of the SOS2 model by binary variables.
2. CC:  Convex Combination formulation is a similar SOS2 like approach using binary variables only.
3. MC: Multiple Choice model uses a semi-continuous approach.
4. INCR: Incremental formulation is using all previous segments.

In this part I discuss:

1. LOG: Formulation with a logarithmic number of binary variables based on the CC model.
2. DLOG: DCC model with a logarithmic number of binary variables.

First I want to side step things, and talk about using binary variables to model integer values.

#### A side note: integer variables as a series of binary variables.

In MIP formulations for the Sudoku puzzle we expand an integer variable indicating the value of a cell $$(i,j)$$: $v_{i,j} \in \{1,\dots,9\}$ into a series of binary variables: $x_{i,j,k} = \begin{cases} 1 & \text{if cell (i,j) has value k, for k=1,\dots,9}\\ 0 & \text{otherwise}\end{cases}$ This is done to make it easier to formulate "all-different" constraints.  The value can be calculated as: $v_{i,j} = \sum_k k \cdot x_{i,j,k}$ In this expansion we use a linear number of binary variables.

A more compact transformation is to use a power expansion. Here we translate an integer variable $$v \in \{0,\dots,U\}$$ into a logarithmic number of binary variables: $v = \sum_{i=1}^{1+\lfloor \log_2(U) \rfloor} 2^{i-1}x_i$ I.e., for a variable with an upperbound of $$U=100$$, we need 7 binary variables. This idea seems to go back to [3]. I have seen this power expansion method being used to solve MIP problems with the ZOOM [4] solver, which only allowed continuous and binary variables.

We will use a different way to encode segment numbers. For this we need to know about Gray codes.

#### Another side note: binary encodings

In the previous section we used a standard binary encoding. A different encoding is using the Gray code [7]. This looks like:

Integer ValueBinary CodeGray Code
000000000
100010001
200100011
300110010
401000110
501010111
601100101
701110100
810001100

The standard binary code has the advantage there is a simple (linear) formula to go from binary to decimal which we can use in a MIP model. The Gray code has no simple formula (an algorithm must be used). The Gray code has as special property that each next code is just one bit different from the previous one. E.g. if we go from 7 to 8, the standard binary code differs in all 4 binary digits $$0111\rightarrow 1000$$. The Gray code only differs in one binary digit: $$0100\rightarrow 1100$$.

With this we have enough to attack our piecewise linear interpolation problem.

#### Recap: the Convex Combination Model

The Convex Combination Model simulates SOS2 variables using binary variables:

CC - Convex Combination Formulation
\begin{align} & \color{DarkRed}x = \sum_k \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{k} \\ & \color{DarkRed}y = \sum_k \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{k} \\ & \sum_k \color{DarkRed}\lambda_k = 1 \\& \color{DarkRed} \lambda_k \le \begin{cases} \color{DarkRed}\delta_k & \text{for k=1}\\ \color{DarkRed}\delta_{k-1} + \color{DarkRed}\delta_k & \text{for k=2,\dots,K-1} \\ \color{DarkRed}\delta_{k-1} & \text{for k=K}\end{cases} \\& \sum_s \color{DarkRed}\delta_s=1 \\ & \color{DarkRed}\lambda_k \ge 0 \\ & \color{DarkRed}\delta_s \in \{0,1\}\\ &k \in \{1,\dots,\color{DarkBlue}K\},\> s \in \{1,\dots,\color{DarkBlue}K-1\} \end{align}

This model will make sure only two consecutive $$\lambda_k$$'s can be nonzero (this is called a SOS2 constraint).

In this model we have $$K-1$$ binary variables ($$K-1$$ is the number of segments).

#### Numerical Example: what is the idea?

We can encode each segment using our Gray code $$000, 001, 011, 010, 110, \dots$$ and associate a binary variable with each digit.

The variables have the following role:

1. The $$\delta$$ variables are binary variables used to indicate which segment is selected.
2. The $$\lambda$$ variables are continuous variables used to perform interpolation along the current segment.

In in the picture above, we can see that if the first digit is 0, we can exclude the last segment, and breakpoint $$\lambda_4$$. If we do this a bit systematically, we can formulate the implications: \begin{align} & \delta_1=0 \Rightarrow \lambda_4=0\\& \delta_1=1 \Rightarrow \lambda_1=\lambda_2=0\\ & \delta_2=0 \Rightarrow \lambda_3=\lambda_4=0\\& \delta_2=1 \Rightarrow \lambda_1=0\end{align} or \begin{align} \lambda_4 &\le \delta_1 \\ \lambda_1+\lambda_2 &\le 1-\delta_1\\ \lambda_3+\lambda_4&\le \delta_2\\ \lambda_1 &\le 1-\delta_2\end{align}

Let's enumerate the results of these constraints. In the table below, we show all possible combinations for $$\delta_b, b=1,2$$. For each combination we mark with a 0 which $$\lambda_k$$'s will be fixed to zero.

Gray code implications
$$\delta_1$$$$\delta_ 2$$$$\lambda_1$$$$\lambda_2$$$$\lambda_3$$$$\lambda_4$$
0000
0100
1100
100000

We see that each combination of $$\delta$$'s selects the correct segment. E.g. $$\delta=(0,1)$$ selects the segment defined by $$\lambda_2$$ and $$\lambda_3$$. In addition $$\delta=(1,0)$$ is not feasible (the $$\lambda$$'s have to add up to one).

#### Why a Gray code?

The previous table explains the usefulness of the Gray code. If we would have used the standard binary encoding, we would not see the same table. Using the encoding: segment1 = 00, segment2 = 01, segment3 = 10, we see:

Standard binary code implications
$$\delta_1$$$$\delta_ 2$$$$\lambda_1$$$$\lambda_2$$$$\lambda_3$$$$\lambda_4$$
00 0
010 0
1000
11000

This is not useful for our purposes. We want to allow two adjacent $$\lambda$$'s to be nonzero.

#### Developing a model

To make a model out of this, we define a boolean incidence matrix $I_{b,v,k} = \begin{cases} \text{true} & \text{if breakpoint k is part of a segment s that has binary digit b equal to v}\\ \text{false} & \text{otherwise}\end{cases}$ where $$v \in \{0,1\}$$. For our example we have:

----     36 SET incident

point1      point2      point3      point4

b1.0         YES         YES         YES
b1.1                                 YES         YES
b2.0         YES         YES
b2.1                     YES         YES         YES

With this the model can look like:

LOG - Logarithmic model
\begin{align} & \color{DarkRed}x = \sum_k \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{k} \\ & \color{DarkRed}y = \sum_k \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{k} \\ & \sum_k \color{DarkRed}\lambda_k = 1 \\& \sum_{k | \text{not } I(b,0,k)} \color{DarkRed} \lambda_k \le \color{DarkRed} \delta_b && \forall b \\& \sum_{k | \text{not } I(b,1,k)} \color{DarkRed}\lambda_k\le 1-\color{DarkRed}\delta_b && \forall b\\ & \color{DarkRed}\lambda_k \ge 0 \\ & \color{DarkRed}\delta_b \in \{0,1\}\\ & k \in \{1,\dots,\color{DarkBlue}K\}\\ & s \in \{1,\dots,\color{DarkBlue}K-1\}\\ & b \in \{1,\dots,\lceil log_2(\color{DarkBlue}K-1) \rceil \} \end{align}

#### The GAMS model

The GAMS model can look like:

 set   k 'breakpoints' /point1*point4/   s 'segments' /segment1*segment3/   sk(s,k) 'mapping'   b 'number of binary variables' /b1,b2/   b01 /0,1/ ; * calculate mapping between breakpoints and segments sk(s,k) = ord(s)=ord(k) or ord(s)=ord(k)-1; display sk; table data(k,*) 'location of breakpoints'            x   y   point1   1   6   point2   3   2   point3   6   8   point4  10   7 ; table gray(s,b) 'Gray encoding of segments'              b1  b2    segment1   0   0    segment2   0   1    segment3   1   1 ; set   incident(b,b01,k)   notincident(b,b01,k) ; incident(b,'1',k) = sum(sk(s,k),gray(s,b)); incident(b,'0',k) = sum(sk(s,k),1-gray(s,b)); notincident(b,b01,k) = not incident(b,b01,k); display incident,notincident; positive variable lambda(k) 'interpolation'; binary variable delta(b) 'segment encoding'; variable x,y; equations    xdef       'x'    ydef       'y'    link0(b)   'link delta-lambda'    link1(b)   'link delta-lambda'    sumlambda  'interpolation' ; xdef.. x =e= sum(k, lambda(k)*data(k,'x')); ydef.. y =e= sum(k, lambda(k)*data(k,'y')); link0(b)..  sum(notincident(b,'0',k),lambda(k)) =l= delta(b); link1(b)..  sum(notincident(b,'1',k),lambda(k)) =l= 1-delta(b); sumlambda.. sum(k, lambda(k)) =e= 1; x.fx = 5; model m /all/; option optcr=0; solve m maximizing y using mip; display x.l,y.l,delta.l,lambda.l;

The results look like:

----     61 VARIABLE x.L                   =        5.000
VARIABLE y.L                   =        6.000

----     61 VARIABLE delta.L  segment encoding

b2 1.000

----     61 VARIABLE lambda.L  interpolation

point2 0.333,    point3 0.667

This indicates we selected segment 2 with breakpoints 2 and 3. We also have $$\delta = (0,1)$$. The generated equations look like:

---- xdef  =E=  x

xdef..  - lambda(point1) - 3*lambda(point2) - 6*lambda(point3) - 10*lambda(point4) + x =E= 0 ; (LHS = 5, INFES = 5 ****)

---- ydef  =E=  y

ydef..  - 6*lambda(point1) - 2*lambda(point2) - 8*lambda(point3) - 7*lambda(point4) + y =E= 0 ; (LHS = 0)

link0(b1)..  lambda(point4) - delta(b1) =L= 0 ; (LHS = 0)

link0(b2)..  lambda(point3) + lambda(point4) - delta(b2) =L= 0 ; (LHS = 0)

link1(b1)..  lambda(point1) + lambda(point2) + delta(b1) =L= 1 ; (LHS = 0)

link1(b2)..  lambda(point1) + delta(b2) =L= 1 ; (LHS = 0)

---- sumlambda  =E=  interpolation

sumlambda..  lambda(point1) + lambda(point2) + lambda(point3) + lambda(point4) =E= 1 ; (LHS = 0, INFES = 1 ****)

Of course, a logarithmic model does not make much sense for a model with just 3 segments. We just save 1 binary variable: instead of $$K-1=3$$ binary variables, we have $$\lceil \log_2(K-1)\rceil = 2$$ binary variables.

#### Pyomo

The Pyomo model for this problem can look like:

#
# expected solution X=5, Y=6
#

xdata = [1., 3., 6., 10.]
ydata  = [6.,2.,8.,7.]

from pyomo.core import *

model = ConcreteModel()

model.X = Var(bounds=(1,10))
model.Y = Var(bounds=(0,100))

model.con = Piecewise(model.Y,model.X,
pw_pts=xdata,
pw_constr_type='EQ',
f_rule=ydata,
pw_repn='LOG')

# see what we get for Y when X=5
def con2_rule(model):
return model.X==5

model.con2 = Constraint(rule=con2_rule)

model.obj = Objective(expr=model.Y, sense=maximize)

The output is:

 [    0.00] Setting up Pyomo environment [    0.00] Applying Pyomo preprocessing actions ERROR: Constructing component 'con' from data=None failed: ValueError: 'con'     does not have a list of domain points with length (2^n)+1 [    0.02] Pyomo Finished ERROR: Unexpected exception while loading model:         'con' does not have a list of domain points with length (2^n)+1

Bummer.  You probably have to pad the breakpoints until we have a number of segments that is equal to a power of two. I don't understand this requirement. As we saw before, there really is no good reason for this restriction. As a workaround, we can do this by just repeating the last points as many times as needed:

xdata = [1., 3., 6., 10., 10.]
ydata  = [6.,2.,8.,7.,7.]

#### The DLOG formulation.

The DCC model discussed in [2] looks like:

DCC - Disaggregated Convex Combination Formulation
\begin{align} & \color{DarkRed}x = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{s,k} \\ & \color{DarkRed}y = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{s,k}\\& \color{DarkRed}\delta_s = \sum_{k|\color{DarkBlue}SK(s,k)} \color{DarkRed} \lambda_{s,k} \\& \sum_s \color{DarkRed}\delta_s=1 \\ & \color{DarkRed}\lambda_{s,k} \ge 0 \\ & \color{DarkRed}\delta_s \in \{0,1\} \end{align}

This model can be adapted to use the logarithmic encoding trick discussed before. We use a similar incidence table as before, but now it refers to $$\lambda_{s,k}$$. I.e.:  $I_{b,v,s,k} = \begin{cases} \text{true} & \text{if breakpoint k is part of a segment s that has binary digit b equal to v}\\ \text{false} & \text{otherwise}\end{cases}$ where $$v \in \{0,1\}$$.

This yields a model like:

DLOG - DCC Formulation with logarithmic number of binary variables.
\begin{align} & \color{DarkRed}x = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{s,k} \\ & \color{DarkRed}y = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{s,k} \\ & \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkRed}\lambda_{s,k} = 1 \\& \sum_{s,k | \text{not } I(b,0,s,k)} \color{DarkRed} \lambda_{s,k} \le \color{DarkRed}\delta_b && \forall b \\& \sum_{s,k | \text{not } I(b,1,s,k)} \color{DarkRed} \lambda_{s,k} \le 1- \color{DarkRed}\delta_b && \forall b \\ & \color{DarkRed}\lambda_{s,k} \ge 0 \\ & \color{DarkRed}\delta_b \in \{0,1\}\\ & k \in \{1,\dots,\color{DarkBlue}K\}\\ & s \in \{1,\dots,\color{DarkBlue}K-1\}\\ & b \in \{1,\dots,\lceil log_2(\color{DarkBlue}K-1) \rceil \} \end{align}

The GAMS model can look like:

 set   k 'breakpoints' /point1*point4/   s 'segments' /segment1*segment3/   sk(s,k) 'mapping'   b 'number of binary variables' /b1,b2/   b01 /0,1/ ; * calculate mapping between breakpoints and segments sk(s,k) = ord(s)=ord(k) or ord(s)=ord(k)-1; display sk; table data(k,*) 'location of breakpoints'            x   y   point1   1   6   point2   3   2   point3   6   8   point4  10   7 ; table gray(s,b) 'Gray encoding of segments'              b1  b2    segment1   0   0    segment2   0   1    segment3   1   1 ; set   incident(b,b01,s,k)   notincident(b,b01,s,k) ; incident(b,'1',sk(s,k)) = gray(s,b); incident(b,'0',sk(s,k)) = 1-gray(s,b); notincident(b,b01,sk) = not incident(b,b01,sk); display incident,notincident; positive variable lambda(s,k) 'interpolation'; binary variable delta(b) 'segments are Gray encoded'; variable x,y; equations    xdef       'x'    ydef       'y'    link0(b)   'link delta-lambda'    link1(b)   'link delta-lambda'    sumlambda  'select segment' ; xdef.. x =e= sum(sk(s,k), lambda(s,k)*data(k,'x')); ydef.. y =e= sum(sk(s,k), lambda(s,k)*data(k,'y')); link0(b)..  sum(notincident(b,'0',s,k),lambda(s,k)) =l= delta(b); link1(b)..  sum(notincident(b,'1',s,k),lambda(s,k)) =l= 1-delta(b); sumlambda.. sum(sk(s,k),lambda(s,k)) =e= 1; x.fx = 5; model m /all/; option optcr=0; solve m maximizing y using mip; display x.l,y.l,delta.l,lambda.l;

The solution looks like:

----     61 VARIABLE x.L                   =        5.000
VARIABLE y.L                   =        6.000

----     61 VARIABLE delta.L  segments are Gray encoded

b2 1.000

----     61 VARIABLE lambda.L  interpolation

point2      point3

segment2       0.333       0.667

This means $$\delta = (0,1)$$ which gives us segment 2.

We can also show the individual equations:

---- xdef  =E=  x

xdef..  - lambda(segment1,point1) - 3*lambda(segment1,point2) - 3*lambda(segment2,point2) - 6*lambda(segment2,point3)

- 6*lambda(segment3,point3) - 10*lambda(segment3,point4) + x =E= 0 ; (LHS = 5, INFES = 5 ****)

---- ydef  =E=  y

ydef..  - 6*lambda(segment1,point1) - 2*lambda(segment1,point2) - 2*lambda(segment2,point2) - 8*lambda(segment2,point3)

- 8*lambda(segment3,point3) - 7*lambda(segment3,point4) + y =E= 0 ; (LHS = 0)

link0(b1)..  lambda(segment3,point3) + lambda(segment3,point4) - delta(b1) =L= 0 ; (LHS = 0)

link0(b2)..  lambda(segment2,point2) + lambda(segment2,point3) + lambda(segment3,point3) + lambda(segment3,point4)

- delta(b2) =L= 0 ; (LHS = 0)

link1(b1)..  lambda(segment1,point1) + lambda(segment1,point2) + lambda(segment2,point2) + lambda(segment2,point3)

+ delta(b1) =L= 1 ; (LHS = 0)

link1(b2)..  lambda(segment1,point1) + lambda(segment1,point2) + delta(b2) =L= 1 ; (LHS = 0)

---- sumlambda  =E=  select segment

sumlambda..  lambda(segment1,point1) + lambda(segment1,point2) + lambda(segment2,point2) + lambda(segment2,point3)

+ lambda(segment3,point3) + lambda(segment3,point4) =E= 1 ; (LHS = 0, INFES = 1 ****)

#### Pyomo

Again when we try Pyomo, we get into trouble:

#
# expected solution X=5, Y=6
#

xdata = [1., 3., 6., 10.]
ydata  = [6.,2.,8.,7.]

from pyomo.core import *

model = ConcreteModel()

model.X = Var(bounds=(1,10))
model.Y = Var(bounds=(0,100))

model.con = Piecewise(model.Y,model.X,
pw_pts=xdata,
pw_constr_type='EQ',
f_rule=ydata,
pw_repn='DLOG')

# see what we get for Y when X=5
def con2_rule(model):
return model.X==5

model.con2 = Constraint(rule=con2_rule)

model.obj = Objective(expr=model.Y, sense=maximize)

yields the following messages:

 [    0.00] Setting up Pyomo environment [    0.00] Applying Pyomo preprocessing actions ERROR: Constructing component 'con' from data=None failed: ValueError: 'con'     does not have a list of domain points with length (2^n)+1 [    0.02] Pyomo Finished ERROR: Unexpected exception while loading model:         'con' does not have a list of domain points with length (2^n)+1

I am not sure what the reason can be for this restriction.

#### Conclusion

This is hopefully a somewhat more accessible description of methods for piecewise linear interpolation with just a logarithmic number of binary variables. This method can be applied to many formulations. Here we just showed two: the Convex Combination (CC) and the Disaggregated Convex Combination (DCC) formulation. Finally. we demonstrated some problems in the Pyomo implementation.

#### References

1. Piecewise linear functions and formulations for interpolation (part 1), http://yetanothermathprogrammingconsultant.blogspot.com/2019/02/piecewise-linear-functions-and.html
2. Piecewise linear functions and formulations for interpolation (part 2), http://yetanothermathprogrammingconsultant.blogspot.com/2019/02/piecewise-linear-functions-and_22.html
3. L. J. Watters, Reduction of integer polynomial programming problems to zero-one linear programming problems, Operations Research 15, 1171–1174,  1967
4. Jaya Singhal, Roy E. Marsten, Thomas L. Morin, Fixed Order Branch-and-Bound Methods for Mixed-Integer Programming: The zoom System, ORSA Journal on Computing, 1989, vol. 1, issue 1, 44-51.
5. Juan Pablo Vielma, Shabbir Ahmed and George Nemhauser, Mixed-Integer Models for Nonseparable Piecewise Linear Optimization: Unifying Framework and Extensions, Operations Research 58(2):303-315, 2010
6. Juan Pablo Vielma and George L. Nemhauser, Modelling Disjunctive Constraints with a Logarithmic Number of Binary Variables and Constraints, Mathematical Programming 128 (2011), pp. 49-72.
7. Gray Code, https://en.wikipedia.org/wiki/Gray_code

## Friday, February 22, 2019

### Piecewise linear functions and formulations for interpolation (part 2)

Pyomo supports a number of methods to perform piecewise linear interpolation. In part 1 [1], the following methods for piecewise linear interpolation were discussed:

1. SOS2: use SOS2 variables to model the piecewise linear functions. This is an easy modeling exercise.
2. BIGM_BIN: use binary variables and big-M constraints to enable and disable constraints. This is a more complex undertaking, especially if we want to use the smallest possible values for the big-M constants. Pyomo has bug in this version.
3. BIGM_SOS1: a slight variation of the BIGM_BIN model. Pyomo has the same problem with this model.
Here I show a few more formulations:

1. DCC: Disaggregated Convex Combination formulation is a simulation of the SOS2 model by binary variables.
2. CC:  Convex Combination formulation is a similar SOS2 like approach using binary variables only.
3. MC: Multiple Choice model uses a semi-continuous approach.
4. INCR: Incremental formulation is using all previous segments.
I'll state the formulation in mathematical notation, hopefully a little bit more accessible than in some papers I looked at, and illustrate the formulation with GAMS code to make it more concrete.

Note that part 3 [2] will discuss some newer formulations: LOG and DLOG that only need a logarithmic number of binary variables.

For demonstration purposes we use the same small example with 4 breakpoints and 3 segments.

#### Method 4: DCC - Disaggregated Convex Combination Model

This is an impressive name for a model that is actually not very complicated.

First we define a binary variable for each segment: $\delta_s = \begin{cases} 1 & \text{if segment s is selected}\\ 0 & \text{otherwise}\end{cases}$ Then we perform interpolation on the selected segment. For this we define a weight variable: $\lambda_{s,k} \ge 0$ for the two end points belonging to segment $$s$$.  This way we achieve the "only two neighboring weights $$\lambda$$ can be nonzero" constraint from our SOS2 model. The mathematical model can look like:

DCC - Disaggregated Convex Combination Formulation
\begin{align} & \color{DarkRed}x = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{s,k} \\ & \color{DarkRed}y = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{s,k}\\& \color{DarkRed}\delta_s = \sum_{k|\color{DarkBlue}SK(s,k)} \color{DarkRed} \lambda_{s,k} \\& \sum_s \color{DarkRed}\delta_s=1 \\ & \color{DarkRed}\lambda_{s,k} \ge 0 \\ & \color{DarkRed}\delta_s \in \{0,1\} \end{align}

Here the mapping set $$SK(s,k)$$ is TRUE for the two breakpoints $$k$$ belonging to segment $$s$$.

We can also write the model more directly, without the mapping $$SK(s,k)$$:

DCC - Disaggregated Convex Combination Formulation 2
\begin{align} & \color{DarkRed}x = \sum_s \left( \color{DarkBlue}{\bar{x}}_s \color{DarkRed}\lambda_{s,s} + \color{DarkBlue}{\bar{x}}_{s+1} \color{DarkRed}\lambda_{s,s+1} \right) \\ & \color{DarkRed}y = \sum_s \left( \color{DarkBlue}{\bar{y}}_s \color{DarkRed}\lambda_{s,s}+\color{DarkBlue}{\bar{y}}_{s+1} \color{DarkRed}\lambda_{s,s+1}\right)\\& \color{DarkRed}\delta_s = \color{DarkRed} \lambda_{s,s}+ \color{DarkRed}\lambda_{s,s+1} \\& \sum_s \color{DarkRed}\delta_s=1 \\ & \color{DarkRed}\lambda_{s,k} \ge 0 \\ & \color{DarkRed}\delta_s \in \{0,1\} \end{align}

Here is the GAMS version:

 set   k 'breakpoints' /point1*point4/   s 'segments' /segment1*segment3/   sk(s,k) 'mapping' ; sk(s,k) = ord(s)=ord(k) or ord(s)=ord(k)-1; display sk; table data(k,*)            x   y   point1   1   6   point2   3   2   point3   6   8   point4  10   7 ; positive variable lambda(s,k) 'interpolation'; binary variable delta(s) 'select segment'; variable x,y; equations    xdef       'x'    ydef       'y'    link(s)    'link delta-lambda'    sumdelta   'select segment' ; xdef.. x =e= sum(sk(s,k), lambda(s,k)*data(k,'x')); ydef.. y =e= sum(sk(s,k), lambda(s,k)*data(k,'y')); link(s).. delta(s) =e= sum(sk(s,k),lambda(s,k)); sumdelta.. sum(s, delta(s)) =e= 1; x.fx = 5; model m /all/; option optcr=0; solve m maximizing y using mip; display x.l,y.l,delta.l,lambda.l;

In this model I made a strong distinction between segments $$s$$ and breakpoints $$k$$. This will help the GAMS model to perform domain checking (type checking), so we get a bit better protection against errors in the model. Of course, if you prefer you just can indicate segments by $$1,\dots,K-1$$.

Note that the linking constraints link, perform two functions. First: they make sure that for unselected segments (with $$\delta_s=0$$), we have $$\lambda_{s,k}=0$$.  Second, for the selected segment, we automatically sum the $$\lambda$$'s to 1.

The output of the model looks like:

----     13 SET sk  mapping

point1      point2      point3      point4

segment1         YES         YES
segment2                     YES         YES
segment3                                 YES         YES

----     45 VARIABLE x.L                   =        5.000
VARIABLE y.L                   =        6.000

----     45 VARIABLE delta.L

segment2 1.000

----     45 VARIABLE lambda.L

point2      point3

segment2       0.333       0.667

The display of the set SK confirms we setup the topography correctly: segment $$k$$ uses points $$k, k+1$$.

We fixed $$x=5$$ which yields $$y=6$$. We see $$\delta_2=1$$ so the second segment is selected, and this is also visible from the variables $$\lambda_{s,k}$$ indicating we interpolate between points 2 and 3.

We can have a look at the generated equations:

---- xdef  =E=

xdef..  - lambda(segment1,point1) - 3*lambda(segment1,point2) - 3*lambda(segment2,point2) - 6*lambda(segment2,point3)

- 6*lambda(segment3,point3) - 10*lambda(segment3,point4) + x =E= 0 ; (LHS = 5, INFES = 5 ****)

---- ydef  =E=

ydef..  - 6*lambda(segment1,point1) - 2*lambda(segment1,point2) - 2*lambda(segment2,point2) - 8*lambda(segment2,point3)

- 8*lambda(segment3,point3) - 7*lambda(segment3,point4) + y =E= 0 ; (LHS = 0)

link(segment1)..  - lambda(segment1,point1) - lambda(segment1,point2) + delta(segment1) =E= 0 ; (LHS = 0)

link(segment2)..  - lambda(segment2,point2) - lambda(segment2,point3) + delta(segment2) =E= 0 ; (LHS = 0)

link(segment3)..  - lambda(segment3,point3) - lambda(segment3,point4) + delta(segment3) =E= 0 ; (LHS = 0)

---- sumdelta  =E=

sumdelta..  delta(segment1) + delta(segment2) + delta(segment3) =E= 1 ; (LHS = 0, INFES = 1 ****)

In a sense we simulated the SOS2 model using binary variables. More visible here is the two-step approach:

1. Select the segment $$s$$ using the binary variables $$\delta_s$$.
2. Interpolate between the breakpoints of this segment using the positive variables $$\lambda_{s,k}$$.
Of course in a MIP model we have simultaneous equations, so these things happen actually at the same time. The 2-step paradigm is more of a useful mental model.

For completeness, here is the GAMS model without set SK(s,k):

 * DCC2: Disaggregated Convex Combination formulation 2 * without mapping SK(s,k) set   k 'breakpoints' /point1*point4/   s(k) 'segments' /point1*point3/ ; table data(k,*)            x   y   point1   1   6   point2   3   2   point3   6   8   point4  10   7 ; positive variable lambda(s,k) 'interpolation'; binary variable delta(s) 'select segment'; variable x,y; equations    xdef       'x'    ydef       'y'    link(s)    'link delta-lambda'    sumdelta   'select segment' ; xdef.. x =e= sum(s(k), lambda(s,k)*data(k,'x')+lambda(s,k+1)*data(k+1,'x')); ydef.. y =e= sum(s(k), lambda(s,k)*data(k,'y')+lambda(s,k+1)*data(k+1,'y')); link(s(k)).. delta(s) =e= lambda(s,k)+lambda(s,k+1); sumdelta.. sum(s, delta(s)) =e= 1; x.fx = 5; model m /all/; option optcr=0; solve m maximizing y using mip; display x.l,y.l,delta.l,lambda.l;

#### Method 5: CC - Convex Combination Model

This formulation is very similar to the previous one. The main difference is the structure of the $$\lambda$$ variables. We index them by $$k$$ only. This is more like the SOS2 model.  The manner in which we link $$\lambda_k$$ to $$\delta_s$$ becomes somewhat different. Instead of an equality we use an inequality $\lambda_k \le \sum_{s|SK(s,k)}\delta_s$ This make sure that only for a selected segment with $$\delta_s=1$$, the corresponding $$\lambda_k$$'s can be nonzero. We need to add explicitly that $$\sum_k \lambda_k=1$$ as this is no longer implied by the linking constraints. The model looks like:

CC - Convex Combination Formulation
\begin{align} & \color{DarkRed}x = \sum_k \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{k} \\ & \color{DarkRed}y = \sum_k \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{k} \\ & \sum_k \color{DarkRed}\lambda_k = 1 \\& \color{DarkRed} \lambda_k \le \sum_{s|\color{DarkBlue}SK(s,k)} \color{DarkRed}\delta_s \\& \sum_s \color{DarkRed}\delta_s=1 \\ & \color{DarkRed}\lambda_k \ge 0 \\ & \color{DarkRed}\delta_s \in \{0,1\} \end{align}

This model has fewer continuous variables than the DCC model. The GAMS version looks like:

 set   k 'breakpoints' /point1*point4/   s 'segments' /segment1*segment3/   sk(s,k) 'mapping' ; sk(s,k) = ord(s)=ord(k) or ord(s)=ord(k)-1; display sk; table data(k,*)            x   y   point1   1   6   point2   3   2   point3   6   8   point4  10   7 ; positive variable lambda(k) 'interpolation'; binary variable delta(s) 'select segment'; variable x,y; equations    xdef       'x'    ydef       'y'    link(k)    'link delta-lambda'    sumlambda  'interpolation'    sumdelta   'select segment' ; xdef.. x =e= sum(k, lambda(k)*data(k,'x')); ydef.. y =e= sum(k, lambda(k)*data(k,'y')); link(k).. lambda(k) =l= sum(sk(s,k),delta(s)); sumlambda.. sum(k, lambda(k)) =e= 1; sumdelta.. sum(s, delta(s)) =e= 1; x.fx = 5; model m /all/; option optcr=0; solve m maximizing y using mip; display x.l,y.l,delta.l,lambda.l;

The output (including the generated equations) is:

Generated equations

---- xdef  =E=  x

xdef..  - lambda(point1) - 3*lambda(point2) - 6*lambda(point3) - 10*lambda(point4) + x =E= 0 ; (LHS = 5, INFES = 5 ****)

---- ydef  =E=  y

ydef..  - 6*lambda(point1) - 2*lambda(point2) - 8*lambda(point3) - 7*lambda(point4) + y =E= 0 ; (LHS = 0)

link(point1)..  lambda(point1) - delta(segment1) =L= 0 ; (LHS = 0)

link(point2)..  lambda(point2) - delta(segment1) - delta(segment2) =L= 0 ; (LHS = 0)

link(point3)..  lambda(point3) - delta(segment2) - delta(segment3) =L= 0 ; (LHS = 0)

link(point4)..  lambda(point4) - delta(segment3) =L= 0 ; (LHS = 0)

Solution
---- sumlambda  =E=  interpolation

sumlambda..  lambda(point1) + lambda(point2) + lambda(point3) + lambda(point4) =E= 1 ; (LHS = 0, INFES = 1 ****)

---- sumdelta  =E=  select segment

sumdelta..  delta(segment1) + delta(segment2) + delta(segment3) =E= 1 ; (LHS = 0, INFES = 1 ****)

----     49 VARIABLE x.L                   =        5.000
VARIABLE y.L                   =        6.000

----     49 VARIABLE delta.L  select segment

segment2 1.000

----     49 VARIABLE lambda.L  interpolation

point2 0.333,    point3 0.667

The methods DCC and CC simulate our SOS2 model from [1] using binary variables. This means the model can handle step functions (we don't need to form a slope). The models are not too difficult to setup, as is illustrated by the GAMS models implementing them.

An early reference to this method is [4].

#### Method 6: MC - Multiple Choice formulation

This is a well-known method. We assume we can calculate slopes and intercepts for each segment. I.e. we have $y = a_s + b_s x\>\> \text{ if \bar{x}_s \le x \le \bar{x}_{s+1}}$  with \begin{align} & a_s = \frac{\bar{y}_{s+1}-\bar{y}_s}{\bar{x}_{s+1}-\bar{x}_s} \\ & b_s = \bar{y}_s - a_s \bar{x}_s \end{align}

For the model we introduce binary variables $$\delta_s$$ to indicate which segment is selected, and so-called semi-continuous variables $$v_s \in {0} \cup [\bar{x}_s,\bar{x}_{s+1}]$$. The variable $$v_s$$ can be modeled as $\bar{x}_s \delta_s \le v_s \le \bar{x}_{s+1} \delta_s$

With this we can formulate the complete model:

MC - Multiple Choice Formulation
\begin{align} & \color{DarkRed}x = \sum_s \color{DarkRed}v_s \\ & \color{DarkRed}y = \sum_s \left( \color{DarkBlue}a_s \color{DarkRed} v_s + \color{DarkBlue}b_s \color{DarkRed} \delta_s \right) \\ &\color{DarkBlue}{\bar{x}}_s \color{DarkRed}\delta_s \le \color{DarkRed}v_s \le \color{DarkBlue}{\bar{x}}_{s+1} \color{DarkRed}\delta_s \\& \sum_s \color{DarkRed}\delta_s=1 \\ & \color{DarkRed}\delta_s \in \{0,1\}\\ & \color{DarkBlue}a_s = \frac{\color{DarkBlue}{\bar{y}}_{s+1}-\color{DarkBlue}{\bar{y}}_s}{\color{DarkBlue}{\bar{x}}_{s+1}-\color{DarkBlue}{\bar{x}}_s} \\ &\color{DarkBlue}b_s = \color{DarkBlue}{\bar{y}}_s - \color{DarkBlue} a_s \color{DarkBlue}{\bar{x}}_s \end{align}

Notes:

• The sandwich equation  $$\bar{x}_s \delta_s \le v_s \le \bar{x}_{s+1} \delta_s$$ must likely be implemented as two inequalities.
• This construct says: $$v_s=0$$ or $$v_s \in [\bar{x}_s, \bar{x}_{s+1}]$$. $$v_s$$ is sometimes called semi-continuous.
• The variables $$\delta_s$$ and $$v_s$$ are connected: \begin{align} &\delta_s = 0 \Rightarrow v_s = 0\\ & \delta_s = 1 \Rightarrow v_s = x\end{align} I.e. they operate in parallel.
• I have seen cases where this model outperformed the SOS2 formulation.

The GAMS model is simple:

 set   k 'breakpoints' /k1*k4/   s(k) 'segments' /k1*k3/ ; table data(k,*)        x   y   k1   1   6   k2   3   2   k3   6   8   k4  10   7 ; data(s(k),'dx') = data(k+1,'x')-data(k,'x'); data(s(k),'dy') = data(k+1,'y')-data(k,'y'); data(s(k),'slope') = data(k,'dy')/data(k,'dx'); data(s(k),'intercept') = data(k,'y')-data(k,'slope')*data(k,'x'); display data; variable v(s) 'equal to x or 0'; binary variable delta(s) 'select segment'; variable x,y; equations    xdef       'x'    ydef       'y'    semicont1(k)    semicont2(k)    sumdelta   'select segment' ; xdef.. x =e= sum(s, v(s)); ydef.. y =e= sum(s, data(s,'slope')*v(s)+data(s,'intercept')*delta(s)); semicont1(s(k)).. v(s) =l= data(k+1,'x')*delta(s); semicont2(s)..    v(s) =g= data(s,'y')*delta(s); sumdelta.. sum(s, delta(s)) =e= 1; x.fx = 5; model m /all/; option optcr=0; solve m maximizing y using mip; display data,x.l,y.l,delta.l,v.l;

The results look like:

----     43 PARAMETER data

x           y          dx          dy       slope   intercept

k1       1.000       6.000       2.000      -4.000      -2.000       8.000
k2       3.000       2.000       3.000       6.000       2.000      -4.000
k3       6.000       8.000       4.000      -1.000      -0.250       9.500
k4      10.000       7.000

----     43 VARIABLE x.L                   =        5.000
VARIABLE y.L                   =        6.000

----     43 VARIABLE delta.L  select segment

k2 1.000

----     43 VARIABLE v.L  equal to x or 0

k2 5.000

Segment 2 is selected. We can see that directly from the binary variable $$\delta_s$$, but also from the semi-continuous variable $$v_s$$.

#### Method 7: INC - Incremental Formulation

In the incremental or delta method we add up all contributions of "earlier" segments, to find our $$x$$ and $$y$$.

Let $$s'$$ be the segment that contains our current $$x$$. We define a binary variable $$\delta_{s}$$ as $\delta_{s} = \begin{cases} 1 & \text{for s\lt s'}\\ 0 & \text{for s \ge s'}\end{cases}$ In addition we use a continuous variable $$\lambda_s \in [0,1]$$ indicating how much of each segment we "use up". The contribution for earlier segments is 1 and for the current segment we have a fractional value. So we have: $\begin{cases} \lambda_{s} = 1 & \text{for s\lt s'} \\\lambda_{s} \in [0,1] & \text{for s= s'} \\ \lambda_{s} = 0 & \text{for s\gt s'}\end{cases}$ With these definitions we can write: \begin{align} x = \bar{x}_1 + \sum_s \lambda_s (\bar{x}_{s+1} - \bar{x}_s) \\ y = \bar{y}_1 + \sum_s \lambda_s (\bar{y}_{s+1} - \bar{y}_s) \end{align}

Now we need to formulate a structure that enforce these rules on $$\delta$$ and $$\lambda$$. The following will do that: $\lambda_{s+1} \le \delta_s \le \lambda_s$ Typically you will need to implement this as two separate constraints.

The complete model looks like:

INC - Incremental Formulation
\begin{align} & \color{DarkRed}x = \color{DarkBlue}{\bar{x}}_1 + \sum_s \color{DarkRed}\lambda_s (\color{DarkBlue}{\bar{x}}_{s+1} - \color{DarkBlue}{\bar{x}}_s) \\ & \color{DarkRed}y = \color{DarkBlue}{\bar{y}}_1 + \sum_s \color{DarkRed}\lambda_s (\color{DarkBlue}{\bar{y}}_{s+1} - \color{DarkBlue}{\bar{y}}_s) \\ & \color{DarkRed} \lambda_{s+1} \le \color{DarkRed}\delta_s \le \color{DarkRed}\lambda_s \\ & \color{DarkRed} \delta_s \in \{0,1\} \\ & \color{DarkRed} \lambda_s \in [0,1] \end{align}

We can see that we can fix the last $$\delta_s=0$$. The model will behave correctly without this, but we may help the presolver a bit with this.

The GAMS model can look like:

 set   k 'breakpoints' /k1*k4/   s(k) 'segments' /k1*k3/ ; table data(k,*)        x   y   k1   1   6   k2   3   2   k3   6   8   k4  10   7 ; positive variable lambda(s) 'contribution factor of segment'; lambda.up(s) = 1; binary variable delta(s) 'previously contributing segments'; variable x,y; equations    xdef       'x'    ydef       'y'    link1(s)   'links lambda delta'    link2(s)   'links lambda delta' ; * fix last delta(s) to 0. * this is not strictly needed delta.fx(s)\$(ord(s)=card(s)) = 0; xdef.. x =e= data('k1','x')+ sum(s(k), lambda(s)*(data(k+1,'x')-data(k,'x'))); ydef.. y =e= data('k1','y')+ sum(s(k), lambda(s)*(data(k+1,'y')-data(k,'y'))); link1(s)..  lambda(s+1) =l= delta(s); link2(s)..  delta(s) =l= lambda(s); x.fx = 7; model m /all/; option optcr=0; solve m maximizing y using mip; display x.l,y.l,delta.l,lambda.l;

Note that in equation link1, we go one position too far when addressing $$\lambda_{s+1}$$ for the last $$s$$. GAMS will make that reference zero, and that is correct for this case. We see in the equation listing:

link1(k1)..  lambda(k2) - delta(k1) =L= 0 ; (LHS = 0)

link1(k2)..  lambda(k3) - delta(k2) =L= 0 ; (LHS = 0)

link1(k3)..  - delta(k3) =L= 0 ; (LHS = 0)

The solution is

----     36 VARIABLE x.L                   =        5.000
VARIABLE y.L                   =        6.000

----     36 VARIABLE delta.L  previously contributing segments

k1 1.000

----     36 VARIABLE lambda.L  contribution factor of segment

k1 1.000,    k2 0.667

Notes:
• A slightly different formulation is the following. Use the definition: $\delta_{s} = \begin{cases} 1 & \text{for s\le s'}\\ 0 & \text{for s \gt s'}\end{cases}$ Now we use the sandwich equation: $\delta_{s+1} \le \lambda_s \le \delta_s$ With this formulation we can fix: $$\delta_1=1$$.
• This approach goes back to [5].

#### References

1. Piecewise linear functions and formulations for interpolation (part 1). This post discusses formulations SOS2, BIGM_BIN and BIGM_SOS1. The SOS2 formulation is the most important. Also illustrated how a Pyomo implementation is not implementing BIGM_BIN and BIGM_SOS1 correctly.  http://yetanothermathprogrammingconsultant.blogspot.com/2019/02/piecewise-linear-functions-and.html
2. Piecewise linear functions and formulations for interpolation (part 3). This post describes the LOG and DLOG formulations, which have a logarithmic number of binary variables. We see also some issues with the Pyomo implementation. http://yetanothermathprogrammingconsultant.blogspot.com/2019/03/piecewise-linear-functions-and.html
3. Keely L. Croxton, Bernard Gendron, Thomas L. Magnanti, A Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems, Management Science, Volume 49, Issue 9, 2003, pages 1121-1273
4. G. Dantzig, On the significance of solving linear programming problems with some integer variables, Econometrica 28, 30-44, 1960
5. H. Markowitz, A. Manne, On the solution of discrete programming problems, Econometrica 25, 84-110, 1957