## Thursday, March 14, 2019

### Piecewise linear functions and formulations for interpolation (part 3)

In previous posts I discussed several Mixed Integer Programming formulations for piecewise linear functions [1,2]. In this final third part, I want to demonstrate some newer formulations, using a logarithmic number of binary variables [6]. Finally we observe some problems with existing implementations in Pyomo.

In part 1 [1] you can find discussion and formulations for:

1. SOS2: use SOS2 variables to model the piecewise linear functions. This is an easy modeling exercise.
2. BIGM_BIN: use binary variables and big-M constraints to enable and disable constraints. This is a more complex undertaking, especially if we want to use the smallest possible values for the big-M constants. Pyomo has bug in this version.
3. BIGM_SOS1: a slight variation of the BIGM_BIN model. Pyomo has the same problem with this model.
Part 2 [2] contains the following formulations:

1. DCC: Disaggregated Convex Combination formulation is a simulation of the SOS2 model by binary variables.
2. CC:  Convex Combination formulation is a similar SOS2 like approach using binary variables only.
3. MC: Multiple Choice model uses a semi-continuous approach.
4. INCR: Incremental formulation is using all previous segments.

In this part I discuss:

1. LOG: Formulation with a logarithmic number of binary variables based on the CC model.
2. DLOG: DCC model with a logarithmic number of binary variables.

First I want to side step things, and talk about using binary variables to model integer values.

#### A side note: integer variables as a series of binary variables.

In MIP formulations for the Sudoku puzzle we expand an integer variable indicating the value of a cell $$(i,j)$$: $v_{i,j} \in \{1,\dots,9\}$ into a series of binary variables: $x_{i,j,k} = \begin{cases} 1 & \text{if cell (i,j) has value k, for k=1,\dots,9}\\ 0 & \text{otherwise}\end{cases}$ This is done to make it easier to formulate "all-different" constraints.  The value can be calculated as: $v_{i,j} = \sum_k k \cdot x_{i,j,k}$ In this expansion we use a linear number of binary variables.

A more compact transformation is to use a power expansion. Here we translate an integer variable $$v \in \{0,\dots,U\}$$ into a logarithmic number of binary variables: $v = \sum_{i=1}^{1+\lfloor \log_2(U) \rfloor} 2^{i-1}x_i$ I.e., for a variable with an upperbound of $$U=100$$, we need 7 binary variables. This idea seems to go back to [3]. I have seen this power expansion method being used to solve MIP problems with the ZOOM [4] solver, which only allowed continuous and binary variables.

We will use a different way to encode segment numbers. For this we need to know about Gray codes.

#### Another side note: binary encodings

In the previous section we used a standard binary encoding. A different encoding is using the Gray code [7]. This looks like:

Integer ValueBinary CodeGray Code
000000000
100010001
200100011
300110010
401000110
501010111
601100101
701110100
810001100

The standard binary code has the advantage there is a simple (linear) formula to go from binary to decimal which we can use in a MIP model. The Gray code has no simple formula (an algorithm must be used). The Gray code has as special property that each next code is just one bit different from the previous one. E.g. if we go from 7 to 8, the standard binary code differs in all 4 binary digits $$0111\rightarrow 1000$$. The Gray code only differs in one binary digit: $$0100\rightarrow 1100$$.

With this we have enough to attack our piecewise linear interpolation problem.

#### Recap: the Convex Combination Model

The Convex Combination Model simulates SOS2 variables using binary variables:

CC - Convex Combination Formulation
\begin{align} & \color{DarkRed}x = \sum_k \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{k} \\ & \color{DarkRed}y = \sum_k \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{k} \\ & \sum_k \color{DarkRed}\lambda_k = 1 \\& \color{DarkRed} \lambda_k \le \begin{cases} \color{DarkRed}\delta_k & \text{for k=1}\\ \color{DarkRed}\delta_{k-1} + \color{DarkRed}\delta_k & \text{for k=2,\dots,K-1} \\ \color{DarkRed}\delta_{k-1} & \text{for k=K}\end{cases} \\& \sum_s \color{DarkRed}\delta_s=1 \\ & \color{DarkRed}\lambda_k \ge 0 \\ & \color{DarkRed}\delta_s \in \{0,1\}\\ &k \in \{1,\dots,\color{DarkBlue}K\},\> s \in \{1,\dots,\color{DarkBlue}K-1\} \end{align}

This model will make sure only two consecutive $$\lambda_k$$'s can be nonzero (this is called a SOS2 constraint).

In this model we have $$K-1$$ binary variables ($$K-1$$ is the number of segments).

#### Numerical Example: what is the idea?

We can encode each segment using our Gray code $$000, 001, 011, 010, 110, \dots$$ and associate a binary variable with each digit.

The variables have the following role:

1. The $$\delta$$ variables are binary variables used to indicate which segment is selected.
2. The $$\lambda$$ variables are continuous variables used to perform interpolation along the current segment.

In in the picture above, we can see that if the first digit is 0, we can exclude the last segment, and breakpoint $$\lambda_4$$. If we do this a bit systematically, we can formulate the implications: \begin{align} & \delta_1=0 \Rightarrow \lambda_4=0\\& \delta_1=1 \Rightarrow \lambda_1=\lambda_2=0\\ & \delta_2=0 \Rightarrow \lambda_3=\lambda_4=0\\& \delta_2=1 \Rightarrow \lambda_1=0\end{align} or \begin{align} \lambda_4 &\le \delta_1 \\ \lambda_1+\lambda_2 &\le 1-\delta_1\\ \lambda_3+\lambda_4&\le \delta_2\\ \lambda_1 &\le 1-\delta_2\end{align}

Let's enumerate the results of these constraints. In the table below, we show all possible combinations for $$\delta_b, b=1,2$$. For each combination we mark with a 0 which $$\lambda_k$$'s will be fixed to zero.

Gray code implications
$$\delta_1$$$$\delta_ 2$$$$\lambda_1$$$$\lambda_2$$$$\lambda_3$$$$\lambda_4$$
0000
0100
1100
100000

We see that each combination of $$\delta$$'s selects the correct segment. E.g. $$\delta=(0,1)$$ selects the segment defined by $$\lambda_2$$ and $$\lambda_3$$. In addition $$\delta=(1,0)$$ is not feasible (the $$\lambda$$'s have to add up to one).

#### Why a Gray code?

The previous table explains the usefulness of the Gray code. If we would have used the standard binary encoding, we would not see the same table. Using the encoding: segment1 = 00, segment2 = 01, segment3 = 10, we see:

Standard binary code implications
$$\delta_1$$$$\delta_ 2$$$$\lambda_1$$$$\lambda_2$$$$\lambda_3$$$$\lambda_4$$
00 0
010 0
1000
11000

This is not useful for our purposes. We want to allow two adjacent $$\lambda$$'s to be nonzero.

#### Developing a model

To make a model out of this, we define a boolean incidence matrix $I_{b,v,k} = \begin{cases} \text{true} & \text{if breakpoint k is part of a segment s that has binary digit b equal to v}\\ \text{false} & \text{otherwise}\end{cases}$ where $$v \in \{0,1\}$$. For our example we have:

----     36 SET incident

point1      point2      point3      point4

b1.0         YES         YES         YES
b1.1                                 YES         YES
b2.0         YES         YES
b2.1                     YES         YES         YES


With this the model can look like:

LOG - Logarithmic model
\begin{align} & \color{DarkRed}x = \sum_k \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{k} \\ & \color{DarkRed}y = \sum_k \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{k} \\ & \sum_k \color{DarkRed}\lambda_k = 1 \\& \sum_{k | \text{not } I(b,0,k)} \color{DarkRed} \lambda_k \le \color{DarkRed} \delta_b && \forall b \\& \sum_{k | \text{not } I(b,1,k)} \color{DarkRed}\lambda_k\le 1-\color{DarkRed}\delta_b && \forall b\\ & \color{DarkRed}\lambda_k \ge 0 \\ & \color{DarkRed}\delta_b \in \{0,1\}\\ & k \in \{1,\dots,\color{DarkBlue}K\}\\ & s \in \{1,\dots,\color{DarkBlue}K-1\}\\ & b \in \{1,\dots,\lceil log_2(\color{DarkBlue}K-1) \rceil \} \end{align}

#### The GAMS model

The GAMS model can look like:

 set   k 'breakpoints' /point1*point4/   s 'segments' /segment1*segment3/   sk(s,k) 'mapping'   b 'number of binary variables' /b1,b2/   b01 /0,1/ ; * calculate mapping between breakpoints and segments sk(s,k) = ord(s)=ord(k) or ord(s)=ord(k)-1; display sk; table data(k,*) 'location of breakpoints'            x   y   point1   1   6   point2   3   2   point3   6   8   point4  10   7 ; table gray(s,b) 'Gray encoding of segments'              b1  b2    segment1   0   0    segment2   0   1    segment3   1   1 ; set   incident(b,b01,k)   notincident(b,b01,k) ; incident(b,'1',k) = sum(sk(s,k),gray(s,b)); incident(b,'0',k) = sum(sk(s,k),1-gray(s,b)); notincident(b,b01,k) = not incident(b,b01,k); display incident,notincident; positive variable lambda(k) 'interpolation'; binary variable delta(b) 'segment encoding'; variable x,y; equations    xdef       'x'    ydef       'y'    link0(b)   'link delta-lambda'    link1(b)   'link delta-lambda'    sumlambda  'interpolation' ; xdef.. x =e= sum(k, lambda(k)*data(k,'x')); ydef.. y =e= sum(k, lambda(k)*data(k,'y')); link0(b)..  sum(notincident(b,'0',k),lambda(k)) =l= delta(b); link1(b)..  sum(notincident(b,'1',k),lambda(k)) =l= 1-delta(b); sumlambda.. sum(k, lambda(k)) =e= 1; x.fx = 5; model m /all/; option optcr=0; solve m maximizing y using mip; display x.l,y.l,delta.l,lambda.l;

The results look like:

----     61 VARIABLE x.L                   =        5.000
VARIABLE y.L                   =        6.000

----     61 VARIABLE delta.L  segment encoding

b2 1.000

----     61 VARIABLE lambda.L  interpolation

point2 0.333,    point3 0.667


This indicates we selected segment 2 with breakpoints 2 and 3. We also have $$\delta = (0,1)$$. The generated equations look like:

---- xdef  =E=  x

xdef..  - lambda(point1) - 3*lambda(point2) - 6*lambda(point3) - 10*lambda(point4) + x =E= 0 ; (LHS = 5, INFES = 5 ****)

---- ydef  =E=  y

ydef..  - 6*lambda(point1) - 2*lambda(point2) - 8*lambda(point3) - 7*lambda(point4) + y =E= 0 ; (LHS = 0)

link0(b1)..  lambda(point4) - delta(b1) =L= 0 ; (LHS = 0)

link0(b2)..  lambda(point3) + lambda(point4) - delta(b2) =L= 0 ; (LHS = 0)

link1(b1)..  lambda(point1) + lambda(point2) + delta(b1) =L= 1 ; (LHS = 0)

link1(b2)..  lambda(point1) + delta(b2) =L= 1 ; (LHS = 0)

---- sumlambda  =E=  interpolation

sumlambda..  lambda(point1) + lambda(point2) + lambda(point3) + lambda(point4) =E= 1 ; (LHS = 0, INFES = 1 ****)


Of course, a logarithmic model does not make much sense for a model with just 3 segments. We just save 1 binary variable: instead of $$K-1=3$$ binary variables, we have $$\lceil \log_2(K-1)\rceil = 2$$ binary variables.

#### Pyomo

The Pyomo model for this problem can look like:

#
# expected solution X=5, Y=6
#

xdata = [1., 3., 6., 10.]
ydata  = [6.,2.,8.,7.]

from pyomo.core import *

model = ConcreteModel()

model.X = Var(bounds=(1,10))
model.Y = Var(bounds=(0,100))

model.con = Piecewise(model.Y,model.X,
pw_pts=xdata,
pw_constr_type='EQ',
f_rule=ydata,
pw_repn='LOG')

# see what we get for Y when X=5
def con2_rule(model):
return model.X==5

model.con2 = Constraint(rule=con2_rule)

model.obj = Objective(expr=model.Y, sense=maximize)


The output is:

 [    0.00] Setting up Pyomo environment [    0.00] Applying Pyomo preprocessing actions ERROR: Constructing component 'con' from data=None failed: ValueError: 'con'     does not have a list of domain points with length (2^n)+1 [    0.02] Pyomo Finished ERROR: Unexpected exception while loading model:         'con' does not have a list of domain points with length (2^n)+1

Bummer.  You probably have to pad the breakpoints until we have a number of segments that is equal to a power of two. I don't understand this requirement. As we saw before, there really is no good reason for this restriction. As a workaround, we can do this by just repeating the last points as many times as needed:

xdata = [1., 3., 6., 10., 10.]
ydata  = [6.,2.,8.,7.,7.]


#### The DLOG formulation.

The DCC model discussed in [2] looks like:

DCC - Disaggregated Convex Combination Formulation
\begin{align} & \color{DarkRed}x = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{s,k} \\ & \color{DarkRed}y = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{s,k}\\& \color{DarkRed}\delta_s = \sum_{k|\color{DarkBlue}SK(s,k)} \color{DarkRed} \lambda_{s,k} \\& \sum_s \color{DarkRed}\delta_s=1 \\ & \color{DarkRed}\lambda_{s,k} \ge 0 \\ & \color{DarkRed}\delta_s \in \{0,1\} \end{align}

This model can be adapted to use the logarithmic encoding trick discussed before. We use a similar incidence table as before, but now it refers to $$\lambda_{s,k}$$. I.e.:  $I_{b,v,s,k} = \begin{cases} \text{true} & \text{if breakpoint k is part of a segment s that has binary digit b equal to v}\\ \text{false} & \text{otherwise}\end{cases}$ where $$v \in \{0,1\}$$.

This yields a model like:

DLOG - DCC Formulation with logarithmic number of binary variables.
\begin{align} & \color{DarkRed}x = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{x}}_k \color{DarkRed}\lambda_{s,k} \\ & \color{DarkRed}y = \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkBlue}{\bar{y}}_k \color{DarkRed}\lambda_{s,k} \\ & \sum_{s,k|\color{DarkBlue}SK(s,k)} \color{DarkRed}\lambda_{s,k} = 1 \\& \sum_{s,k | \text{not } I(b,0,s,k)} \color{DarkRed} \lambda_{s,k} \le \color{DarkRed}\delta_b && \forall b \\& \sum_{s,k | \text{not } I(b,1,s,k)} \color{DarkRed} \lambda_{s,k} \le 1- \color{DarkRed}\delta_b && \forall b \\ & \color{DarkRed}\lambda_{s,k} \ge 0 \\ & \color{DarkRed}\delta_b \in \{0,1\}\\ & k \in \{1,\dots,\color{DarkBlue}K\}\\ & s \in \{1,\dots,\color{DarkBlue}K-1\}\\ & b \in \{1,\dots,\lceil log_2(\color{DarkBlue}K-1) \rceil \} \end{align}

The GAMS model can look like:

 set   k 'breakpoints' /point1*point4/   s 'segments' /segment1*segment3/   sk(s,k) 'mapping'   b 'number of binary variables' /b1,b2/   b01 /0,1/ ; * calculate mapping between breakpoints and segments sk(s,k) = ord(s)=ord(k) or ord(s)=ord(k)-1; display sk; table data(k,*) 'location of breakpoints'            x   y   point1   1   6   point2   3   2   point3   6   8   point4  10   7 ; table gray(s,b) 'Gray encoding of segments'              b1  b2    segment1   0   0    segment2   0   1    segment3   1   1 ; set   incident(b,b01,s,k)   notincident(b,b01,s,k) ; incident(b,'1',sk(s,k)) = gray(s,b); incident(b,'0',sk(s,k)) = 1-gray(s,b); notincident(b,b01,sk) = not incident(b,b01,sk); display incident,notincident; positive variable lambda(s,k) 'interpolation'; binary variable delta(b) 'segments are Gray encoded'; variable x,y; equations    xdef       'x'    ydef       'y'    link0(b)   'link delta-lambda'    link1(b)   'link delta-lambda'    sumlambda  'select segment' ; xdef.. x =e= sum(sk(s,k), lambda(s,k)*data(k,'x')); ydef.. y =e= sum(sk(s,k), lambda(s,k)*data(k,'y')); link0(b)..  sum(notincident(b,'0',s,k),lambda(s,k)) =l= delta(b); link1(b)..  sum(notincident(b,'1',s,k),lambda(s,k)) =l= 1-delta(b); sumlambda.. sum(sk(s,k),lambda(s,k)) =e= 1; x.fx = 5; model m /all/; option optcr=0; solve m maximizing y using mip; display x.l,y.l,delta.l,lambda.l;

The solution looks like:

----     61 VARIABLE x.L                   =        5.000
VARIABLE y.L                   =        6.000

----     61 VARIABLE delta.L  segments are Gray encoded

b2 1.000

----     61 VARIABLE lambda.L  interpolation

point2      point3

segment2       0.333       0.667


This means $$\delta = (0,1)$$ which gives us segment 2.

We can also show the individual equations:

---- xdef  =E=  x

xdef..  - lambda(segment1,point1) - 3*lambda(segment1,point2) - 3*lambda(segment2,point2) - 6*lambda(segment2,point3)

- 6*lambda(segment3,point3) - 10*lambda(segment3,point4) + x =E= 0 ; (LHS = 5, INFES = 5 ****)

---- ydef  =E=  y

ydef..  - 6*lambda(segment1,point1) - 2*lambda(segment1,point2) - 2*lambda(segment2,point2) - 8*lambda(segment2,point3)

- 8*lambda(segment3,point3) - 7*lambda(segment3,point4) + y =E= 0 ; (LHS = 0)

link0(b1)..  lambda(segment3,point3) + lambda(segment3,point4) - delta(b1) =L= 0 ; (LHS = 0)

link0(b2)..  lambda(segment2,point2) + lambda(segment2,point3) + lambda(segment3,point3) + lambda(segment3,point4)

- delta(b2) =L= 0 ; (LHS = 0)

link1(b1)..  lambda(segment1,point1) + lambda(segment1,point2) + lambda(segment2,point2) + lambda(segment2,point3)

+ delta(b1) =L= 1 ; (LHS = 0)

link1(b2)..  lambda(segment1,point1) + lambda(segment1,point2) + delta(b2) =L= 1 ; (LHS = 0)

---- sumlambda  =E=  select segment

sumlambda..  lambda(segment1,point1) + lambda(segment1,point2) + lambda(segment2,point2) + lambda(segment2,point3)

+ lambda(segment3,point3) + lambda(segment3,point4) =E= 1 ; (LHS = 0, INFES = 1 ****)


#### Pyomo

Again when we try Pyomo, we get into trouble:

#
# expected solution X=5, Y=6
#

xdata = [1., 3., 6., 10.]
ydata  = [6.,2.,8.,7.]

from pyomo.core import *

model = ConcreteModel()

model.X = Var(bounds=(1,10))
model.Y = Var(bounds=(0,100))

model.con = Piecewise(model.Y,model.X,
pw_pts=xdata,
pw_constr_type='EQ',
f_rule=ydata,
pw_repn='DLOG')

# see what we get for Y when X=5
def con2_rule(model):
return model.X==5

model.con2 = Constraint(rule=con2_rule)

model.obj = Objective(expr=model.Y, sense=maximize)


yields the following messages:

 [    0.00] Setting up Pyomo environment [    0.00] Applying Pyomo preprocessing actions ERROR: Constructing component 'con' from data=None failed: ValueError: 'con'     does not have a list of domain points with length (2^n)+1 [    0.02] Pyomo Finished ERROR: Unexpected exception while loading model:         'con' does not have a list of domain points with length (2^n)+1

I am not sure what the reason can be for this restriction.

#### Conclusion

This is hopefully a somewhat more accessible description of methods for piecewise linear interpolation with just a logarithmic number of binary variables. This method can be applied to many formulations. Here we just showed two: the Convex Combination (CC) and the Disaggregated Convex Combination (DCC) formulation. Finally. we demonstrated some problems in the Pyomo implementation.

#### References

1. Piecewise linear functions and formulations for interpolation (part 1), http://yetanothermathprogrammingconsultant.blogspot.com/2019/02/piecewise-linear-functions-and.html
2. Piecewise linear functions and formulations for interpolation (part 2), http://yetanothermathprogrammingconsultant.blogspot.com/2019/02/piecewise-linear-functions-and_22.html
3. L. J. Watters, Reduction of integer polynomial programming problems to zero-one linear programming problems, Operations Research 15, 1171–1174,  1967
4. Jaya Singhal, Roy E. Marsten, Thomas L. Morin, Fixed Order Branch-and-Bound Methods for Mixed-Integer Programming: The zoom System, ORSA Journal on Computing, 1989, vol. 1, issue 1, 44-51.
5. Juan Pablo Vielma, Shabbir Ahmed and George Nemhauser, Mixed-Integer Models for Nonseparable Piecewise Linear Optimization: Unifying Framework and Extensions, Operations Research 58(2):303-315, 2010
6. Juan Pablo Vielma and George L. Nemhauser, Modelling Disjunctive Constraints with a Logarithmic Number of Binary Variables and Constraints, Mathematical Programming 128 (2011), pp. 49-72.
7. Gray Code, https://en.wikipedia.org/wiki/Gray_code