Yet Another Math Programming Consultant: December 2017

Wednesday, December 27, 2017

Filling rectangles with polyominos

I know about dominos, but polyominos [1] are new for me. In [2] a problem is posted: fill a rectangle (or square) with a number of different polyominos. No overlap or rotation allowed. I assume there is no limitation in the supply of polyominos.

For example, consider a $11 \times 11$ grid, and choose from the following building blocks:

Five different polyominos

We can try to fill our grid, but we will not always succeed:

Fail to cover complete grid

I solved this as a MIP as follows. My decision variables are:

\[x_{i,j,k} = \begin{cases} 1 & \text{if we place polyomino $k$ at location $(i,j)$}\\ 0 & \text{otherwise} \end{cases}\]

I used as rule that the left-upper corner of each polynomino is its anchor. I.e. in the picture above we have $x_{1,1,4} = 1$, $x_{2,1,2}=1$, $x_{1,3,5}=1$ etc.

To formulate a non-overlap constraint I populated a set $cover_{i,j,k,i',j'}$, with elements that exist if cell $(i',j')$ is covered when we place polyomino $k$ in cell $(i,j)$. To require each cell $(i',j')$ is covered we can say:

\[ \forall i',j': \sum_{i,j,k|cover_{i,j,k,i',j'}} x_{i,j,k} = 1\]

This constraint is infeasible if we cannot cover each cell $(i',j')$ exactly once. In order to make sure we can show a meaningful solution when we cannot cover each cell, we formulate the following optimization model:

\[\begin{align} \max\>&\sum_{i,j} y_{i,j}\\&y_{i',j'} = \sum_{i,j,k|cover_{i,j,k,i',j'}} x_{i,j,k}\\&x_{i,j,k}\in \{0,1\}\\&y_{i,j} \in \{0,1\}\end{align}\]

Here $y_{i,j}=1$ indicates cell $(i,j)$ is covered exactly once, and $y_{i,j}=0$ says the cell is not covered.

With a little bit of effort we can produce the following:

61 x 61 board with 9 different polyominos

A SAT solver may be better suited for this [2], but MIP works for all but very large instances.

References

Polyomino, https://en.wikipedia.org/wiki/Polyomino
2D bin packing on a grid, https://stackoverflow.com/questions/47918792/2d-bin-packing-on-a-grid

Tuesday, December 19, 2017

SOS1 variables and Wikipedia

In Wikipedia [1] the following definition of a Special Ordered Set of Type 1 (SOS1) [2] is given:

Special Ordered Sets of type 1 (SOS1 or S1): are a set of variables, at most one of which can take a strictly positive value, all others being at 0.

This is not completely correct. A SOS1 member can be a negative variable or a free variable. Such a variable may take a negative value.

An example:

Model with SOS1 variables with a negative lower bound

I.e. we want to minimize $z=\sum_i x_i$ with $x_i \in [-i,\infty)$ while the $x_i$'s form a SOS1 set. The results are:

---- VAR x  

          LOWER          LEVEL          UPPER         MARGINAL

i1        -1.0000          .            +INF            1.0000      
i2        -2.0000          .            +INF            1.0000      
i3        -3.0000        -3.0000        +INF            1.0000      

                           LOWER          LEVEL          UPPER         MARGINAL

---- VAR z                 -INF           -3.0000        +INF             .

Indeed, we see a single negative value; the others remain zero.

Definition

The correct definition is found in most manuals, e.g. the docs on OSL (a discontinued IBM product) MPS format mentions [3]:

A special ordered set (SOS) defines restrictions on the integer variables in the set. During branch-and-bound processing, branching on these sets consists of setting all variables before a certain point in the set to zero on one branch and setting all of them after a certain point to one on the alternative branch.
Three different types of special ordered sets are recognized:

Type 1, where at most one variable in the set may take a nonzero value.

Type 2, where at most two variables in the set may take nonzero values. If two variables are nonzero, they must be adjacent in the set. This allows the modeling of nonconvex separable functions, such as economies of scale.

Type 3, where at most one variable in the set may take a nonzero value, and there was at least one equality row with an RHS value of 1.0 in which all the variables in the set had a coefficient of 1.0. In this case, it is possible to say that all the variables are 0-1 variables. This allows the modeling of decisions.

The SOS3 type is an extension that is not much used.

But...

Looks like there is no complete uniformity here. When doing some further tests, we found out a few solvers (GAMS/BDMLP, GAMS/LINDO) return something different:

---- VAR x  

          LOWER          LEVEL          UPPER         MARGINAL

i1        -1.0000        -1.0000        +INF            1.0000      
i2        -2.0000        -2.0000        +INF            1.0000      
i3        -3.0000        -3.0000        +INF            1.0000      

                           LOWER          LEVEL          UPPER         MARGINAL

---- VAR z                 -INF           -6.0000        +INF             .

Using the democratic rule: "one solver, one vote" they are on the losing side of this debate. Of course it would be much better if all solvers do the same thing.

Update: LINDO has confirmed this as a bug and fixed it. The behavior of BDMLP was acknowledged as "as designed".

Notes on using SOS1 sets

These days we probably should replace SOS sets by formulations with binary variables. Often binary variable formulations perform better. For this we often need good bounds, in order to make big-M constants having tight values. Some solvers support indicator constraints and general constraints (like max, min, etc.). These constructs may introduce SOS1 variables behind the scenes.

SOS2 variables

The Wikipedia entry for SOS1 [1] has been fixed by now (thank you!). However, a similar issue is present in the section on SOS2 variables. The Wikipedia text says:

Special Ordered Sets of type 2 (SOS2 or S2): an ordered set of non-negative variables, of which at most two can be non-zero, and if two are non-zero these must be consecutive in their ordering.

Using the same model as before but replacing sos1 variable by sos2 variable we should see as solution:

---- VAR x  

          LOWER          LEVEL          UPPER         MARGINAL

i1        -1.0000          .            +INF            1.0000      
i2        -2.0000        -2.0000        +INF            1.0000      
i3        -3.0000        -3.0000        +INF            1.0000      

                           LOWER          LEVEL          UPPER         MARGINAL

---- VAR z                 -INF           -5.0000        +INF             .

As we can see SOS2 variables can be negative. Of course in virtually all practical cases SOS2 variables will be non-negative, but it is not a requirement.

I am not sure if the emphasis in the Wikipedia text on the ordering is so useful. This may be read as $x_i \le x_{i+1}$, which is not what SOS2 is about. I think just saying the two non-zero values must be consecutive or adjacent should be enough. Writing is not so easy!

References

Special Ordered Set, https://en.wikipedia.org/wiki/Special_ordered_set
E. M. L. Beale and J. A. Tomlin, Special Facilities in a General Mathematical Programming System for Nonconvex Problems Using Ordered Sets of Variables, Operational Research 69, pp 447–454, 1970
IBM Optimization Subroutine Library, Guide and Reference, Release 2, 1991
Wilhelm Hummeltenberg, Implementations of special ordered sets in MP software, European Journal of Operations Research, 17 (1), 1984, pp. 1-15.

Sunday, December 17, 2017

A production scheduling problem

Problem

A post in [1] describes an interesting problem:

I have read about cutting stock problem, but this is a bit different.
We have orders in different variants, and a maximum sized machine to produce it.
Variants would be X,S,XL,L and ordered quantities are 100,40,40,80.

Say, the machine width is 6. This means, we can put 6 different variants
together and produce it.

We can put 2 X,1 S,1 XL,2 L, this means if we produce it 50 times,
output is :
X = 100 (0 waste)
S = 50 (10 waste)
XL = 50 (10 waste)
L = 100 (20 waste)
Total of 40 waste in 300 produced.

Another aproach to reduce waste would be creating 2 different variation. We
can put 4 X, 2 S and produce it 25 times, with 10 waste and make another
setup and put 2 XL,4 L and produce it 20 times with no waste. With total
10 waste we handled this production in 2 setups.

Since setup has a price, we would prefer first setup, or depending on
quantities, we may choose the second one.

I have read about cutting stock and it looks similar to this one, but
ability to divide quantities between different setups, this has more
potential to optimize and therefore more complex.

High level model

We can formulate this as a mixed-integer quadratic model. The data looks like:

The optimization model is as follows:

Notes:

The variables $run_r$ indicate if we execute a production run or not. Within a production run we use the same pattern.
We have $run_r=0 \Rightarrow pattern_{v,r}=0, runlen_r=0$. We could drop one of these constraints but not both.
Equation $edemand$ has a nasty quadratic term, It causes the model to be non-linear and non-convex. Of course we can solve this with a global solver.
The $order$ constraint makes sure all active runs (ie. $run_r=1$) are before the inactive ones ($run_r=0$). This makes the solution look nicer and will reduce some symmetry.
The funny $\forall r-1$ means we generate the $order$ constraint for all $r$ except the first one.
The model type is actually MIQCP (Mixed Integer Quadratically Constrained Program). There are commercial solvers for convex problems of this type, but for this non-convex problem I used a global MINLP solver (Baron, Couenne, Antigone).

Results

The input data can be summarized as:

----     62 PARAMETER costwaste  cost of waste

X  1.000,    S  2.000,    XL 3.000,    L  4.000


----     62 PARAMETER costsetup            =      100.000  fixed cost of a run

----     62 PARAMETER demand  

X  100,    S   40,    XL  40,    L   80

The solution looks like:

----     62 VARIABLE runlen.L  length of a run

run1 20,    run2 40


----     62 VARIABLE pattern.L  production pattern of a run

          run1        run2

X            1           2
S                        1
XL           2
L            2           1


----     62 VARIABLE waste.L  waste per variant

                      ( ALL          0. )


----     62 VARIABLE wastecost.L           =        0.000  cost of waste
            VARIABLE setupcost.L           =      200.000  cost related to setups
            VARIABLE cost.L                =      200.000  total cost

We see we use two production runs. We are under-utilizing the machine (a pattern of 5 and 4 items). It is noted that this solution is not unique. A different solver will likely give a different solution.

Shorter run lengths

I believe we are missing some objective in this model. Suppose we have a demand of X=60 only (zero demand for the other variants). We can organize in a single run in at least the following ways:

Pattern X=1. This gives a run length of 60. Total cost = 100 (cost of one setup).
Pattern X=6. This gives a run length of 10. Total cost = 100 (cost of one setup).

I would guess the second configuration is better. Probably we should add some cost for the total sum of the run lengths.

If we add a unit cost of 1 for operating the machine, the total operating cost can be expressed as $\sum_r runlen_r$. If we add this to the total cost we get the following solution:

----     68 VARIABLE runlen.L  length of a run

run1  4,    run2 40


----     68 VARIABLE pattern.L  production pattern of a run

          run1        run2

X            5           2
S                        1
XL                       1
L                        2


----     68 VARIABLE waste.L  waste per variant

                      ( ALL          0. )


----     68 VARIABLE wastecost.L           =        0.000  cost of waste
            VARIABLE setupcost.L           =      200.000  cost related to setups
            VARIABLE opcost.L              =       44.000  operating cost
            VARIABLE cost.L                =      244.000  total cost

A visualization of these two production schedules can look like:

Production Schedules

Linearization

If we want to use a high-performance MIP solver, we need to linearize equation $edemand$. I don't know how to linearize an integer variable times an integer variable. However, if we can replace an integer variable by a series of binary variables, then we are in more favorable territory.

The machine can handle up 6 variants at the time. This means $pattern_{v,r}\le 6$. This makes $pattern$ a good candidate for a binary expansion:

\[\begin{align}&pattern_{v,r} = \sum_k k \cdot \delta_{v,r,k}\\&\sum_k \delta_{v,r,k} \le 1\\& \delta_{v,r,k} \in \{0,1\}\end{align}\]

where $k$ runs from 1 to $cap=6$. Now we can write the $edemand$ equation as:

\[\sum_{r,k} k \cdot ( runlen_r \cdot \delta_{v,r,k}) = demand_v + waste_v\]

The multiplication of a binary variable and a continuous variable can be written as linear inequalities [2]. This results in:

\[\begin{align} & \sum_{r,k} k \cdot w_{v,r,k} = demand_v + waste_v\\ & w_{v,r,k} \le \delta_{v,r,k} \cdot maxrunlen\\ & w_{v,r,k} \le runlen_r\\ & w_{v,r,k} \ge runlen_r - maxrunlen \cdot (1-\delta_{v,r,k}) \\ &w_{v,r,k}\ge 0 \end{align} \]

We can also go the other route: expand $runlen_r$. This would lead to:

\[\begin{align}
&\sum_{r,\ell} \ell \cdot v_{v,r,\ell} = demand_v + waste_v\\
&\sum_{\ell} \ell \cdot \rho_{r,\ell} = runlen_r\\
&\sum_{\ell} \rho_{r,\ell} \le 1\\
& v_{v,r,\ell} \le cap \cdot \rho_{r,\ell}\\
& v_{v,r,\ell} \le pattern_{v,r}\\
& v_{v,r,\ell} \ge pattern_{v,r} - cap \cdot (1-\rho_{r,\ell})\\
& \rho_{r,\ell} \in \{0,1\}\\
& v_{v,r,\ell} \ge 0
\end{align}\]

With these lengthy reformulations we end up with a linear mixed-integer program. This means we can use standard MIP solvers instead of a global nonlinear solver. In addition, MIP formulations often perform better than their non-linear counterparts. The first MIP formulation seems to work much better than the second.

----    136 PARAMETER report  

                    obj        time       nodes

minlp.baron     244.000       2.120      58.000
mip1.cplex      244.000       1.218     673.000
mip2.cplex      244.000      12.063    3382.000

The linearization is somewhat convoluted, so having available a smaller nonlinear model can help in constructing and debugging these models. As is the case in database design (logical vs. physical design) and programming (premature optimization), starting with a simpler model that focuses on concepts is often helpful in developing optimization models.

References

Similar to cutting stock problem but not quite, https://math.stackexchange.com/questions/2557776/similar-to-cutting-stock-problem-but-not-quite
Multiplication of a continuous and a binary variable, http://yetanothermathprogrammingconsultant.blogspot.com/2008/05/multiplication-of-continuous-and-binary.html
Gislaine Mara Melega, Silvio Alexandre de Araujo and Raf Jans, Comparison of MIP Models for the Integrated Lot-Sizing and One-Dimensional Cutting Stock Problem, Pesquisa Operacional (2016) 36(1): 167-196.

Thursday, December 14, 2017

Permuted Matrix Balancing

In [1] the following problem is posed:

Problem

I am trying to generate a matrix of numbers with 7 rows and 4 columns. Each row must sum to 100 and each column must have an even spread (if permitted) between a min and max range (specified below).

Goal:

       C1      C2    C3    C4   sum   range 
 1     low                      100    ^
 2     ..                              |  
 3     ..                              |
 4     ..                              | 
 5     ..                              |
 6     ..                              |
 7     high                            _

c1_high = 98
c1_low = 75
c2_high = 15
c2_low = 6
c3_high = 8
c3_low = 2
c4_low = 0.05
c4_high =0.5

In addition to this, i need the spread of each row to be as linear as possible, though a line fitted to the data with a second order polynomial would suffice (with an r^2 value of >0.98).

Note: this description seems to hint the values in the columns must be ordered from $L_j$ to $U_j$. That is actually not the case: we don't assume any ordering.

Solution Outline

This problem somewhat resembles a matrix balancing problem. Let's see how we can model this.

One approach to attack this problem is:

Step 1: Create data with perfect spread

It is easy to generate data with perfect spread wile ignoring that the row sums are 100. We just start at $L_j$ and increase a by fixed step size:

----     40 PARAMETER L  low

c1 75.000,    c2  6.000,    c3  2.000,    c4  0.050


----     40 PARAMETER U  high

c1 98.000,    c2 15.000,    c3  8.000,    c4  0.500


----     40 PARAMETER init  initial matrix

             c1          c2          c3          c4      rowsum

r1       75.000       6.000       2.000       0.050      83.050
r2       78.833       7.500       3.000       0.125      89.458
r3       82.667       9.000       4.000       0.200      95.867
r4       86.500      10.500       5.000       0.275     102.275
r5       90.333      12.000       6.000       0.350     108.683
r6       94.167      13.500       7.000       0.425     115.092
r7       98.000      15.000       8.000       0.500     121.500

This step is just data manipulation.

Step 2: Permute values in each column

The second step is to permute the values within a column to achieve a solution that has a close match to the row sums. This gives:

----     75 VARIABLE y.L  after permutation

            c1          c2          c3          c4      rowsum

r1      86.500      10.500       5.000       0.275     102.275
r2      94.167       7.500       3.000       0.125     104.792
r3      98.000       6.000       2.000       0.050     106.050
r4      90.333       9.000       4.000       0.200     103.533
r5      82.667      12.000       6.000       0.350     101.017
r6      75.000      15.000       8.000       0.500      98.500
r7      78.833      13.500       7.000       0.425      99.758

We are closer to the row sums as expected.

One way to model a permutation of a column vector $a$ inside an optimization model is to find a square permutation matrix $P$ [2] with

\[\begin{align}&y_i = \sum_k P_{i,k}a_k\\ &\sum_k P_{i,k}=1&\forall i \\&\sum_i P_{i,k} = 1&\forall k\\& P_{i,k}\in\{0,1\}\end{align}\]

A permutation matrix has exactly one one in each row and in each column. The rest of the elements are zero. One could say a permutation matrix is a (row or column) permuted identity matrix. The constraints can also be interpreted as assignment constraints (related to the assignment problem).

Step 3: Add minimal relative deviations

The final step is to find small deviations from $y$ such that the row sums are exactly 100.

----     75 VARIABLE x.L  final values

            c1          c2          c3          c4      rowsum

r1      84.265      10.467       4.993       0.275     100.000
r2      89.460       7.431       2.984       0.125     100.000
r3      92.057       5.912       1.980       0.050     100.000
r4      86.863       8.949       3.988       0.200     100.000
r5      81.668      11.985       5.997       0.350     100.000
r6      76.473      15.022       8.005       0.500     100.000
r7      79.071      13.503       7.001       0.425     100.000


----     75 VARIABLE d.L  deviations

            c1          c2          c3          c4

r1      -2.235      -0.033      -0.007 -2.25855E-5
r2      -4.707      -0.069      -0.016 -4.75702E-5
r3      -5.943      -0.088      -0.020 -6.00626E-5
r4      -3.471      -0.051      -0.012 -3.50779E-5
r5      -0.999      -0.015      -0.003 -1.00932E-5
r6       1.473       0.022       0.005 1.489155E-5
r7       0.237       0.003 7.931220E-4 2.399194E-6

For this I used a relative measure: minimize the sum of squared relative deviations:

\[\min \sum_{i,j} \left( \frac{d_{i,j}}{\mu_j} \right)^2 \]

where $\mu_j = (L_j +U_j)/2$. This is to ensure that the deviations are distributed according to the average magnitude of a column. This is to make sure column 1 gets larger deviations than column 4.

A complete model

I believe we need to do steps 2 and 3 simultaneously to find a global optimal solution. Here is an attempt:

This is a convex MIQP model, so we can solve this with high-performance solvers like Cplex and Gurobi.

Notes

We could split the model in two phases (step 2 and step 3 in the above description). I am not sure how this would impact the optimal solution. Also it would not really simplify things.
This model can be linearized by using a different objective: minimize the sum of the absolute values of the (relative) deviations:
\[\min\>\sum_{i,j}\left| \frac{d_{i,j}}{\mu_j} \right|\]
You would end up with a linear MIP model. Possible formulations are variable splitting
\[\begin{align}\min \> & \sum_{i,j} v_{i,j}^{\oplus} + v_{i,j}^{\ominus}\\&v_{i,j}^{\oplus} - v_{i,j}^{\ominus} = \frac{d_{i,j}}{\mu_j}\\&v_{i,j}^{\oplus}, v_{i,j}^{\ominus}\ge 0 \end{align}\]
or bounding:
\[\begin{align}\min\>&\sum_{i,j} v_{i,j}\\&-v_{i,j} \le \frac{d_{i,j}}{\mu_j}\le v_{i,j}\end{align}\]
Instead of dividing by the average column value $\mu_j$ we can divide by the "real" value $y_{i,j}$. Unfortunately, that would make this a general MINLP model. Note that we should "protect" the division, e.g. by the bound $y_{i,j}\ge L_j$.

References

Algorithm for generation of number matrix with specified boundaries, https://stackoverflow.com/questions/47754210/algorithm-for-generation-of-number-matrix-with-specified-boundaries
Permutation Matrix, https://en.wikipedia.org/wiki/Permutation_matrix

Tuesday, December 12, 2017

Quantum Development Kit: Does it come with an LP solver?

Microsoft has released a preview of their Quantum Development Kit [1]. May be we can solve some interesting optimization problems with this [2,3]. One of the authors of [3], Svore, is now heading Microsofts Quantum Architectures and Computation group.

The video above is PR. Here is a video with some more meaty content:

References

Allison Linn, The future is quantum: Microsoft releases free preview of Quantum Development Kit, https://blogs.microsoft.com/ai/2017/12/11/future-quantum-microsoft-releases-free-preview-quantum-development-kit
Whitney Clavin, Developing quantum algorithms for optimization problems, https://phys.org/news/2017-07-quantum-algorithms-optimization-problems.html,
Fernando G.S.L. Brandao, Krysta Svore, Quantum Speed-ups for Semidefinite Programming, https://arxiv.org/abs/1609.05537

Sunday, December 10, 2017

SciPy 1.0 linear programming

The simplex solver in SciPy has been somewhat of a problem. It is a text-book LP solver (tablaux based), so only suited for very small problems. Even on those small problems it is just not very reliable.

Here is a small example [1]:

Min -x2 -x3
x0 + 0.33*x2 + 0.67*x3 = 0.5
x1 + 0.67*x2 + 0.33*x3 = 0.5
x0 + x1 + x2 + x3 = 1.0

We can encode this problem as follows:

from scipy.optimize import linprog
a_eq = [[1.0, 0.0, 0.33, 0.67],
        [0.0, 1.0, 0.67, 0.33], 
        [1, 1, 1, 1]]

b_eq = [0.5, 0.5, 1.0]

c = [0, 0, -1.0, -1.0]

res1 = linprog(c=c, A_eq=a_eq, b_eq=b_eq, method='simplex')
print(res1)

The simplex solver has troubles with this. It returns a sub-optimal solution with objective function value 0:

     fun: -0.0
 message: 'Optimization terminated successfully.'
     nit: 4
   slack: array([], dtype=float64)
  status: 0
 success: True
       x: array([ 0.5,  0.5,  0. ,  0. ])
     con: array([ -9.83213511e-13,  -9.83102488e-13,  -1.96620498e-12])

SciPy 1,0 has a new interior point solver [3] which seems a bit more robust:

res2 = linprog(c=c, A_eq=a_eq, b_eq=b_eq, method='interior-point')
print(res2)

     fun: -1.000000000000884
 message: 'Optimization terminated successfully.'
     nit: 4
   slack: array([], dtype=float64)
  status: 0
 success: True
       x: array([  5.41163802e-13,   5.41163802e-13,   5.00000000e-01,
         5.00000000e-01])

We see an optimal objective of -1. It also gives a warning:

D:\Anaconda3\lib\site-packages\scipy\optimize\_linprog_ip.py:623: OptimizeWarning: A_eq does not appear to be of full row rank. To improve performance, check the problem formulation for redundant equality constraints.
  warn(redundancy_warning, OptimizeWarning)

There is no good reason a Simplex solver should give incorrect results in case the A matrix is not full-rank. Usually solvers add a slack (often not physically) to each constraint, so A will never be rank deficient. The basis matrix can always be chosen such that it is both optimal and non-singular. Here is how an optimal basis (with objective -1) looks like for this problem provided by a different simplex solver:

                    LOWER          LEVEL          UPPER         MARGINAL

---- EQU e1          0.5000         0.5000         0.5000         EPS       'Non-basic'     
---- EQU e2          0.5000         0.5000         0.5000          .        'Basic'  
---- EQU e3          1.0000         1.0000         1.0000        -1.0000    'Non-basic'  

                    LOWER          LEVEL          UPPER         MARGINAL

---- VAR x0           .              .            +INF            1.0000    'Non-basic'     
---- VAR x1           .              .            +INF            1.0000    'Non-basic'  
---- VAR x2           .             0.5000        +INF             .        'Basic'  
---- VAR x3           .             0.5000        +INF             .        'Basic'

We see equation 2 (or its slack) is made basic although the value of the slack is zero. This solution is actually both primal and dual degenerate. [4] A different solver gives:

                   LOWER          LEVEL          UPPER         MARGINAL

---- EQU e1         0.5000         0.5000         0.5000        -1.0000    'Non-basic'   
---- EQU e2         0.5000         0.5000         0.5000        -1.0000    'Non-basic'   
---- EQU e3         1.0000         1.0000         1.0000          .        'Basic'   

                   LOWER          LEVEL          UPPER         MARGINAL

---- VAR x0          .              .            +INF            1.0000    'Non-basic'    
---- VAR x1          .              .            +INF            1.0000    'Non-basic'   
---- VAR x2          .             0.5000        +INF             .        'Basic'   
---- VAR x3          .             0.5000        +INF             .        'Basic'

Same primal solution, but different basis. No hint of dual degeneracy however. Bonus question: Which of these two simplex solutions is more correct?

Actually both are correct. From linear programming 101 we know $y = c_B^T B^{-1}$ where $y$ are the duals. We can calculate this quickly with some R. Suppose we have:

A
##    x0 x1        x2        x3 s1 s2 s3
## e1  1  0 0.3333333 0.6666667  1  0  0
## e2  0  1 0.6666667 0.3333333  0  1  0
## e3  1  1 1.0000000 1.0000000  0  0  1
c
## x0 x1 x2 x3 s1 s2 s3 
##  0  0 -1 -1  0  0  0

Then we can do:

bs ← c('x2','x3','s2')
t(c[bs]) %*% solve(A[,bs])
##      e1 e2 e3
## [1,]  0  0 -1
bs ← c('x2','x3','s3')
t(c[bs]) %*% solve(A[,bs])
##      e1 e2 e3
## [1,] -1 -1  0

There is even a third optimal basis. It has the following duals

bs ← c('x2','x3','s1')
t(c[bs]) %*% solve(A[,bs])
##      e1 e2 e3
## [1,]  0  0 -1

In a sense we have three optimal solutions and SciPy's simplex solver is not able to find a single one of them.

Conclusion

In general the Simplex solver from scipy.optimize.linprog should be avoided. It fails on quite a few small models. The new interior-point solver available in SciPy 1.0 seems to behave much better.

References

Linear programming using scipy.optimize.linprog returns suboptimal solution, https://stackoverflow.com/questions/46702953/linear-programming-using-scipy-optimize-linprog-returns-suboptimal-solution
linprog(method='simplex'), https://docs.scipy.org/doc/scipy/reference/optimize.linprog-simplex.html
linprog(method='interior-point'), https://docs.scipy.org/doc/scipy/reference/optimize.linprog-interior-point.html
What is this EPS in a GAMS solution?, http://yetanothermathprogrammingconsultant.blogspot.com/2017/09/what-is-this-eps-in-gams-solution.html
Finding all optimal LP solutions, http://yetanothermathprogrammingconsultant.blogspot.com/2016/01/finding-all-optimal-lp-solutions.html

Friday, December 8, 2017

Sparse Matrix export from GAMS to R

The problem

A colleague posed the question to me: how we can export a GAMS parameter to an R sparse matrix?. A sparse matrix is a matrix where most of the elements are zero. In large scale optimization, sparse matrices play an important role. Software (solvers, modeling systems) are exploiting sparsity to reduce memory requirements and to speed up calculations. In fact very large problems can only be solved by using sparse data structures. I actually don't think the user wants a sparse matrix, but let's assume we really want one. The R Matrix package [3] has facilities to create and manipulate sparse matrices.

We assume we have the following GAMS data:

set
   i /a,b,c,d,e/
   j /e,f,g,h,i,j/
;

table d(i,j)
    e f g h i j
a 1   2   4
b
c      3
d
e          5
;

execute_unload "gamsdata";

Notice that will export three items: sets i and j and the parameter d.

The big issue here is that GAMS uses a sparse storage scheme throughout. This means only the nonzero elements of matrix d are stored. Unfortunately, this also means that empty columns and empty rows are discarded: they are simply not visible. We can see this when we display the matrix:

----     17 PARAMETER d  

            e           g           i

a       1.000       2.000       4.000
c                   3.000
e                               5.000

The trick is to export also the domains i and j. With the aid of these domain sets we can reconstruct the sparse matrix with named rows and columns and with all empty rows and columns preserved.

Although this example is a small matrix, we assume the matrix is very larger and sparse. This means we don't want to use in R a pivot operation on d to create a 2D dense matrix. In all operations in this write-up we store and operate on just the non-zero elements.

Below I show three different approaches to populate a sparse matrix in R with GAMS data. There are some interesting issues in each approach.

Method 1: GDXRRW and R

We start with reading in the data from the GAMS GDX file:

library(gdxrrw)
# read d,i,j into data frames
D <- rgdx.param("gamsdata","d")
I <- rgdx.set("gamsdata","i")
J <- rgdx.set("gamsdata","j")

When we print things we see they arrived as follows:

D
##   i j d
## 1 a e 1
## 2 a g 2
## 3 a i 4
## 4 c g 3
## 5 e i 5
I
##   i
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
J
##   i
## 1 e
## 2 f
## 3 g
## 4 h
## 5 i
## 6 j

Note that all these three identifiers represent data frames. Warning: note that the columns for the tables I and J are always called i. This is a quirk of gdxrrw.

We need to create mappings from name to index position for both the rows and columns. So "a" → 1, "b" → 2, etc. R has an interesting feature: named vectors allow us to index a vector by strings. We created these named vectors as follows:

# create named vectors for index positions
# this allows us to map from name -> index position
ordi <- seq(nrow(I))
names(ordi) <- I$i
ordj <- seq(nrow(J))
names(ordj) <- J$i

When we print these mapping vectors we see:

ordi
## a b c d e 
## 1 2 3 4 5
ordj
## e f g h i j 
## 1 2 3 4 5 6

Using these vectors we can easily convert columns i and j in the D data frame from strings to numbers.

ordi[ D$i ] # row indices
## a a a c e 
## 1 1 1 3 5
ordj[ D$j ] # column indices
## e g i g i 
## 1 3 5 3 5

With this we are ready to construct the sparse matrix in R:

library(Matrix)
# create sparse matrix
M <- sparseMatrix(
      i = ordi[ D$i ],      # row indices
      j = ordj[ D$j ],      # column indices 
      x = D$d,              # nonzero values
      dims = c(nrow(I),nrow(J)),  # size of matrix
      dimnames = list(I$i,J$i)    # names of rows and columns
  )

When we print this we see:

M
## 5 x 6 sparse Matrix of class "dgCMatrix"
##   e f g h i j
## a 1 . 2 . 4 .
## b . . . . . .
## c . . 3 . . .
## d . . . . . .
## e . . . . 5 .

In this method we used named vectors to be able to index vectors by strings. This trick helped to translate the strings forming the sets i and j into their ordinal values or index positions.

Method 2: Add some GAMS code

We can simplify the R code a bit by letting GAMS do part of the work. We can augment the matrix d with extra information about the index positions for non-zero values.

set v /value,ordi,ordj/;
parameter d2(i,j,v);
d2(i,j,'value') = d(i,j);
d2(i,j,'ordi')$d(i,j) = ord(i);
d2(i,j,'ordj')$d(i,j) = ord(j);
display d2;

The displayed output looks like:

----     22 PARAMETER d2  

          value        ordi        ordj

a.e       1.000       1.000       1.000
a.g       2.000       1.000       3.000
a.i       4.000       1.000       5.000
c.g       3.000       3.000       3.000
e.i       5.000       5.000       5.000

Reading this data is basically the same as before:

# read d,i,j into data frames
D2 <- rgdx.param("gamsdata","d2")
I <- rgdx.set("gamsdata","i")
J <- rgdx.set("gamsdata","j")

The D2 data frame looks like:

D2
##    i j     v d2
## 1  a e value  1
## 2  a e  ordi  1
## 3  a e  ordj  1
## 4  a g value  2
## 5  a g  ordi  1
## 6  a g  ordj  3
## 7  a i value  4
## 8  a i  ordi  1
## 9  a i  ordj  5
## 10 c g value  3
## 11 c g  ordi  3
## 12 c g  ordj  3
## 13 e i value  5
## 14 e i  ordi  5
## 15 e i  ordj  5

The row and columns indices and the values can be extracted by standard subsetting:

# subset D2 to extract ordi, ordj and value data
ordi <- D2[D2$v=="ordi","d2"]
ordj <- D2[D2$v=="ordj","d2"]
v <- D2[D2$v=="value","d2"]

The sparse matrix is now easy to create:

# create sparse matrix
m <- sparseMatrix(
      i = ordi,        # row indices
      j = ordj,        # column indices
      x = v,           # nonzero values
      dims = c(nrow(I),nrow(J)),    # size of matrix
      dimnames = list(I$i,J$i)      # names of rows and columns  
  )

Output is as expected:

m
## 5 x 6 sparse Matrix of class "dgCMatrix"
##   e f g h i j
## a 1 . 2 . 4 .
## b . . . . . .
## c . . 3 . . .
## d . . . . . .
## e . . . . 5 .

Method 3: Using gdxrrw smarter

As pointed out in the comments, the gdxrrw tool can actually returns all the information we need:

library(Matrix)
library(gdxrrw)
d <- rgdx('gamsdata',list(name='d')) 
M2 <- sparseMatrix(
  i = d$val[,1], # row indices
  j = d$val[,2], # column indices 
  x = d$val[,3], # nonzero values
  dims = c(length(d$uels[[1]]),length(d$uels[[2]])), # size of matrix
  dimnames = d$uels # names of rows and columns
)

Method 4: SQLite and R

Here we use SQLite as an intermediate data store. This will allow us to do some SQL joins while importing the data. First we convert the GAMS data into SQLite database. This is very easy:

system("gdx2sqlite -i gamsdata.gdx -o gamsdata.db")

The reading of the data is straightforward:

library(RSQLite)
sqlite<-dbDriver("SQLite") 
db <- dbConnect(sqlite,"gamsdata.db")
D <- dbReadTable(db,"d")
I <- dbReadTable(db,"i")
J <- dbReadTable(db,"j")

The data arrives as:

D
##   i j value
## 1 a e     1
## 2 a g     2
## 3 a i     4
## 4 c g     3
## 5 e i     5
I
##   i
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
J
##   j
## 1 e
## 2 f
## 3 g
## 4 h
## 5 i
## 6 j

The row numbers in an SQLite table can be retrieved by the rowid. For example:

dbGetQuery(db,"select rowid,* from i")
##   rowid i
## 1     1 a
## 2     2 b
## 3     3 c
## 4     4 d
## 5     5 e

Note: the rowid is automatically incremented by SQLite when adding rows. However, after deleting rows the rowid has no longer a direct relation to the row number. Our data base is newly created, so in our case we can assume the rowid is the same as the row number.

We use this to map from index names for the sets i and j to their ordinal numbers. We do this by using a join on tables d and i (or d and j).

The row and column indices can be retrieved as:

ordi <- dbGetQuery(db,"select i.rowid from d inner join i on i.i=d.i")[[1]]
ordj <- dbGetQuery(db,"select j.rowid from d inner join j on j.j=d.j")[[1]]

These vectors look like:

ordi
## [1] 1 1 1 3 5
ordj
## [1] 1 3 5 3 5

By now creating the sparse matrix is simple:

m <- sparseMatrix(
      i = ordi,     # row indices
      j = ordj,     # column indices
      x = D$value,  # nonzero values
      dims = c(nrow(I),nrow(J)),    # size of matrix
      dimnames = list(I$i,J$j)      # names of rows and columns 
  )

The output is no surprise:

m
## 5 x 6 sparse Matrix of class "dgCMatrix"
##   e f g h i j
## a 1 . 2 . 4 .
## b . . . . . .
## c . . 3 . . .
## d . . . . . .
## e . . . . 5 .

Method 4: Pivoting data frames

It is my assumption the user just wanted to pivot from "long" format to "wide" format [1]. This is efficiently done with the function spread [2]:

D <- rgdx.param("gamsdata","d")
library(tidyr)
P <- spread(D,j,d,fill=0)

This yields:

D
##   i j d
## 1 a e 1
## 2 a g 2
## 3 a i 4
## 4 c g 3
## 5 e i 5
P
##   i e g i
## 1 a 1 2 4
## 2 c 0 3 0
## 3 e 0 0 5

If you don't specify the fill parameter, spread will impute NA values. Note this operation is dense: all zeros are explicitly stored.

The duplication in column names can be easily fixed by something like:

colnames(P)][1]=""
P
##     e g i
## 1 a 1 2 4
## 2 c 0 3 0
## 3 e 0 0 5

If we want to keep the empty rows and columns from the original matrix, it is easiest to fix this at the GAMS level:

parameter d3(i,j) 'dense version';
d3(i,j) = d(i,j) + eps;
display d3;

The addition of the special value eps to a number is somewhat special in GAMS. We have:

\[x+\text{eps} = \begin{cases}x&\text{if }x\ne 0\\ \text{eps}&\text{if }x=0\end{cases} \]

This trick can be used to preserve all zero (and non-zero) values. Indeed, the values of d3 are:

----     26 PARAMETER d3  dense version

            e           f           g           h           i           j

a       1.000         EPS       2.000         EPS       4.000         EPS
b         EPS         EPS         EPS         EPS         EPS         EPS
c         EPS         EPS       3.000         EPS         EPS         EPS
d         EPS         EPS         EPS         EPS         EPS         EPS
e         EPS         EPS         EPS         EPS       5.000         EPS

We can import this into R as follows:

D3 <- rgdx.param("gamsdata","d3",squeeze=F)
P <- spread(D3,j,d)

The squeeze option helps to deal with the eps values: we have to make sure that we map eps back to a physical zero. The results look like:

P
##   i e f g h i j
## 1 a 1 0 2 0 4 0
## 2 b 0 0 0 0 0 0
## 3 c 0 0 3 0 0 0
## 4 d 0 0 0 0 0 0
## 5 e 0 0 0 0 5 0

Conclusion

Tons of ways to do this. Using rgdxrrw::rgdx the smart way is the simplest.

References

Pivoting a table: GAMS, SQLite, R and Python, http://yetanothermathprogrammingconsultant.blogspot.com/2016/01/pivoting-table-gams-sqlite-r-and-python.html
Brad Boehmke, Data Processing with dplyr & tyidr, https://rpubs.com/bradleyboehmke/data_wrangling
Martin Maechler and Douglas Bates, 2nd Introduction to the Matrix package, https://cran.r-project.org/web/packages/Matrix/vignettes/Intro2Matrix.pdf