Sunday, December 17, 2017

A production scheduling problem


Problem


A post in [1] describes an interesting problem:

I have read about cutting stock problem, but this is a bit different.
We have orders in different variants, and a maximum sized machine to produce it.
Variants would be X,S,XL,L and ordered quantities are 100,40,40,80. 
Say, the machine width is 6. This means, we can put 6 different variants
together and produce it. 
We can put 2 X,1 S,1 XL,2 L, this means if we produce it 50 times,
output is :
X = 100 (0 waste)
S = 50 (10 waste)
XL = 50 (10 waste)
L = 100 (20 waste)

Total of 40 waste in 300 produced. 
Another aproach to reduce waste would be creating 2 different variation. We
can put 4 X, 2 S and produce it 25 times, with 10 waste and make another
setup and put 2 XL,4 L and produce it 20 times with no waste. With total
10 waste we handled this production in 2 setups. 
Since setup has a price, we would prefer first setup, or depending on
quantities, we may choose the second one. 
I have read about cutting stock and it looks similar to this one, but
ability to divide quantities between different setups, this has more
potential to optimize and therefore more complex.

High level model

 

We can formulate this as a mixed-integer quadratic model. The data looks like:


The optimization model is as follows:


 Notes:

  1. The variables \(run_r\) indicate if we execute a production run or not. Within a production run we use the same pattern.
  2. We have \(run_r=0 \Rightarrow pattern_{v,r}=0, runlen_r=0\). We could drop one of these constraints but not both.
  3. Equation \(edemand\) has a nasty quadratic term, It causes the model to be non-linear and non-convex. Of course we can solve this with a global solver. 
  4. The \(order\) constraint makes sure all active runs (ie. \(run_r=1\)) are before the inactive ones (\(run_r=0\)). This makes the solution look nicer and will reduce some symmetry.
  5. The funny \(\forall r-1\) means we generate the \(order\) constraint for all \(r\) except the first one. 
  6. The model type is actually MIQCP (Mixed Integer Quadratically Constrained Program). There are commercial solvers for convex problems of this type, but for this non-convex problem I used a global MINLP solver (Baron, Couenne, Antigone).

Results


The input data can be summarized as:


----     62 PARAMETER costwaste  cost of waste

X  1.000,    S  2.000,    XL 3.000,    L  4.000


----     62 PARAMETER costsetup            =      100.000  fixed cost of a run

----     62 PARAMETER demand  

X  100,    S   40,    XL  40,    L   80

The solution looks like:


----     62 VARIABLE runlen.L  length of a run

run1 20,    run2 40


----     62 VARIABLE pattern.L  production pattern of a run

          run1        run2

X            1           2
S                        1
XL           2
L            2           1


----     62 VARIABLE waste.L  waste per variant

                      ( ALL          0. )


----     62 VARIABLE wastecost.L           =        0.000  cost of waste
            VARIABLE setupcost.L           =      200.000  cost related to setups
            VARIABLE cost.L                =      200.000  total cost

We see we use two production runs. We are under-utilizing the machine (a pattern of 5 and 4 items). It is noted that this solution is not unique. A different solver will likely give a different solution.

Shorter  run lengths


I believe we are missing some objective in this model. Suppose we have a demand of X=60 only (zero demand for the other variants). We can organize in a single run in at least the following ways:

  • Pattern X=1. This gives a run length of 60. Total cost = 100 (cost of one setup).
  • Pattern X=6. This gives a run length of 10. Total cost = 100 (cost of one setup). 
I would guess the second configuration is better. Probably we we should add some cost for the total sum of the run lengths. 

If we add a unit cost of 1 for operating the machine, the total operating cost can be expressed as \(\sum_r runlen_r\). If we add this to the total cost we get the following solution:


----     68 VARIABLE runlen.L  length of a run

run1  4,    run2 40


----     68 VARIABLE pattern.L  production pattern of a run

          run1        run2

X            5           2
S                        1
XL                       1
L                        2


----     68 VARIABLE waste.L  waste per variant

                      ( ALL          0. )


----     68 VARIABLE wastecost.L           =        0.000  cost of waste
            VARIABLE setupcost.L           =      200.000  cost related to setups
            VARIABLE opcost.L              =       44.000  operating cost
            VARIABLE cost.L                =      244.000  total cost

A visualization of these two production schedules can look like:

Production Schedules

Linearization


If we want to use a high-performance MIP solver, we need to linearize equation \(edemand\). I don't know how to linearize an integer variable times an integer variable. However, if we can replace an integer variable by a series of binary variables, then we are in more favorable territory.

The machine can handle up 6 variants at the time. This means \(pattern_{v,r}\le 6\). This makes \(pattern\) a good candidate for a binary expansion:

\[\begin{align}&pattern_{v,r} = \sum_k k \cdot \delta_{v,r,k}\\&\sum_k  \delta_{v,r,k} \le 1\\& \delta_{v,r,k} \in \{0,1\}\end{align}\]

where \(k\) runs from 1 to \(cap=6\). Now we can write the \(edemand\) equation as:

\[\sum_{r,k}  k \cdot ( runlen_r \cdot \delta_{v,r,k}) = demand_v + waste_v\]

The multiplication of a binary variable and a continuous variable can be written as linear inequalities [2]. This results in:

\[\begin{align} & \sum_{r,k}  k \cdot w_{v,r,k} = demand_v + waste_v\\ & w_{v,r,k} \le \delta_{v,r,k} \cdot maxrunlen\\ & w_{v,r,k} \le runlen_r\\ & w_{v,r,k} \ge runlen_r - maxrunlen \cdot (1-\delta_{v,r,k}) \\ &w_{v,r,k}\ge 0 \end{align} \]

We can also go the other route: expand \(runlen_r\). This would lead to:

\[\begin{align}
&\sum_{r,\ell} \ell \cdot v_{v,r,\ell} = demand_v + waste_v\\
&\sum_{\ell} \ell \cdot \rho_{r,\ell} = runlen_r\\
&\sum_{\ell} \rho_{r,\ell} \le 1\\
& v_{v,r,\ell} \le  cap \cdot \rho_{r,\ell}\\
& v_{v,r,\ell} \le pattern_{v,r}\\
& v_{v,r,\ell} \ge pattern_{v,r} - cap \cdot (1-\rho_{r,\ell})\\
& \rho_{r,\ell} \in \{0,1\}\\
&  v_{v,r,\ell} \ge 0
\end{align}\]

With these lengthy reformulations we end up with a linear mixed-integer program. This means we can use standard MIP solvers instead of a global nonlinear solver. In addition, MIP formulations often perform better than their non-linear counterparts.

The linearization is somewhat convoluted, so having available a smaller nonlinear model can help in constructing and debugging these models. As is the case in database design (logical vs. physical design) and programming (premature optimization), starting with a simpler model that focuses on concepts is often helpful in developing optimization models.

References


  1. Similar to cutting stock problem but not quite, https://math.stackexchange.com/questions/2557776/similar-to-cutting-stock-problem-but-not-quite
  2. Multiplication of a continuous and a binary variable, http://yetanothermathprogrammingconsultant.blogspot.com/2008/05/multiplication-of-continuous-and-binary.html

Thursday, December 14, 2017

Permuted Matrix Balancing

In [1] the following problem is posed:

Problem

I am trying to generate a matrix of numbers with 7 rows and 4 columns. Each row must sum to 100 and each column must have an even spread (if permitted) between a min and max range (specified below).
Goal:
       C1      C2    C3    C4   sum   range 
 1     low                      100    ^
 2     ..                              |  
 3     ..                              |
 4     ..                              | 
 5     ..                              |
 6     ..                              |
 7     high                            _

c1_high = 98
c1_low = 75
c2_high = 15
c2_low = 6
c3_high = 8
c3_low = 2
c4_low = 0.05
c4_high =0.5
In addition to this, i need the spread of each row to be as linear as possible, though a line fitted to the data with a second order polynomial would suffice (with an r^2 value of >0.98).
Note: this description seems to hint the values in the columns must be ordered from \(L_j\) to \(U_j\). That is actually not the case: we don't assume any ordering.

Solution Outline

This problem somewhat resembles a matrix balancing problem. Let's see how we can model this.

One approach to attack this problem is:
Step 1: Create data with perfect spread
It is easy to generate data with perfect spread wile ignoring that the row sums are 100. We just start at \(L_j\) and increase a by fixed step size:


----     40 PARAMETER L  low

c1 75.000,    c2  6.000,    c3  2.000,    c4  0.050


----     40 PARAMETER U  high

c1 98.000,    c2 15.000,    c3  8.000,    c4  0.500


----     40 PARAMETER init  initial matrix

             c1          c2          c3          c4      rowsum

r1       75.000       6.000       2.000       0.050      83.050
r2       78.833       7.500       3.000       0.125      89.458
r3       82.667       9.000       4.000       0.200      95.867
r4       86.500      10.500       5.000       0.275     102.275
r5       90.333      12.000       6.000       0.350     108.683
r6       94.167      13.500       7.000       0.425     115.092
r7       98.000      15.000       8.000       0.500     121.500

This step is just data manipulation.
Step 2: Permute values in each column
The second step is to permute the values within a column to achieve a solution that has a close match to the row sums. This gives:


----     75 VARIABLE y.L  after permutation

            c1          c2          c3          c4      rowsum

r1      86.500      10.500       5.000       0.275     102.275
r2      94.167       7.500       3.000       0.125     104.792
r3      98.000       6.000       2.000       0.050     106.050
r4      90.333       9.000       4.000       0.200     103.533
r5      82.667      12.000       6.000       0.350     101.017
r6      75.000      15.000       8.000       0.500      98.500
r7      78.833      13.500       7.000       0.425      99.758

We are closer to the row sums as expected.

One way to model a permutation of a column vector \(a\) inside an optimization model is to find a square permutation matrix \(P\) [2] with

\[\begin{align}&y_i = \sum_k P_{i,k}a_k\\ &\sum_k P_{i,k}=1&\forall i \\&\sum_i P_{i,k} = 1&\forall k\\& P_{i,k}\in\{0,1\}\end{align}\]

A permutation matrix has exactly one one in each row and in each column. The rest of the elements are zero. One could say a permutation matrix is a (row or column) permuted identity matrix. The constraints can also be interpreted as assignment constraints (related to the assignment problem).

Step 3: Add minimal relative deviations
The final step is to find small deviations from \(y\) such that the row sums are exactly 100.



----     75 VARIABLE x.L  final values

            c1          c2          c3          c4      rowsum

r1      84.265      10.467       4.993       0.275     100.000
r2      89.460       7.431       2.984       0.125     100.000
r3      92.057       5.912       1.980       0.050     100.000
r4      86.863       8.949       3.988       0.200     100.000
r5      81.668      11.985       5.997       0.350     100.000
r6      76.473      15.022       8.005       0.500     100.000
r7      79.071      13.503       7.001       0.425     100.000


----     75 VARIABLE d.L  deviations

            c1          c2          c3          c4

r1      -2.235      -0.033      -0.007 -2.25855E-5
r2      -4.707      -0.069      -0.016 -4.75702E-5
r3      -5.943      -0.088      -0.020 -6.00626E-5
r4      -3.471      -0.051      -0.012 -3.50779E-5
r5      -0.999      -0.015      -0.003 -1.00932E-5
r6       1.473       0.022       0.005 1.489155E-5
r7       0.237       0.003 7.931220E-4 2.399194E-6

For this I used a relative measure: minimize the sum of squared relative deviations:

\[\min \sum_{i,j} \left( \frac{d_{i,j}}{\mu_j} \right)^2 \]

where \(\mu_j = (L_j +U_j)/2\). This is to ensure that the deviations are distributed according to the average magnitude of a column. This is to make sure column 1 gets larger deviations than column 4.

A complete model


I believe we need to do steps 2 and 3 simultaneously to find a global optimal solution. Here is an attempt:



This is a convex MIQP model, so we can solve this with high-performance solvers like Cplex and Gurobi.

Notes

  1. We could split the model in two phases (step 2 and step 3 in the above description). I am not sure how this would impact the optimal solution. Also it would not really simplify things.
  2. This model can be linearized by using a different objective: minimize the sum of the absolute values of the (relative) deviations:
    \[\min\>\sum_{i,j}\left|  \frac{d_{i,j}}{\mu_j} \right|\]
    You would end up with a linear MIP model. Possible formulations are variable splitting
    \[\begin{align}\min \> & \sum_{i,j} v_{i,j}^{\oplus} + v_{i,j}^{\ominus}\\&v_{i,j}^{\oplus} - v_{i,j}^{\ominus} = \frac{d_{i,j}}{\mu_j}\\&v_{i,j}^{\oplus}, v_{i,j}^{\ominus}\ge 0 \end{align}\]
    or bounding:
    \[\begin{align}\min\>&\sum_{i,j} v_{i,j}\\&-v_{i,j} \le \frac{d_{i,j}}{\mu_j}\le v_{i,j}\end{align}\] 
  3. Instead of dividing by the average column value \(\mu_j\) we can divide by the "real" value \(y_{i,j}\). Unfortunately, that would make this a general MINLP model. Note that we should "protect" the division, e.g. by the bound \(y_{i,j}\ge L_j\).

References


  1. Algorithm for generation of number matrix with specified boundaries, https://stackoverflow.com/questions/47754210/algorithm-for-generation-of-number-matrix-with-specified-boundaries
  2. Permutation Matrix, https://en.wikipedia.org/wiki/Permutation_matrix

Tuesday, December 12, 2017

Quantum Development Kit: Does it come with an LP solver?



Microsoft has released a preview of their Quantum Development Kit [1]. May be we can solve some interesting optimization problems with this [2,3]. One of the authors of [3], Svore, is now heading Microsofts Quantum Architectures and Computation group.

The video above is PR. Here is a video with some more meaty content:



References


  1. Allison Linn, The future is quantum: Microsoft releases free preview of Quantum Development Kithttps://blogs.microsoft.com/ai/2017/12/11/future-quantum-microsoft-releases-free-preview-quantum-development-kit
  2. Whitney Clavin, Developing quantum algorithms for optimization problemshttps://phys.org/news/2017-07-quantum-algorithms-optimization-problems.html,
  3. Fernando G.S.L. Brandao, Krysta Svore, Quantum Speed-ups for Semidefinite Programminghttps://arxiv.org/abs/1609.05537


Sunday, December 10, 2017

SciPy 1.0 linear programming

The simplex solver in SciPy has been somewhat of a problem. It is a text-book LP solver (tablaux based), so only suited for very small problems. Even on those small problems it is just not very reliable.

Here is a small example [1]:


Min -x2 -x3
x0 + 0.33*x2 + 0.67*x3 = 0.5
x1 + 0.67*x2 + 0.33*x3 = 0.5
x0 + x1 + x2 + x3 = 1.0

We can encode this problem as follows:


from scipy.optimize import linprog
a_eq = [[1.0, 0.0, 0.33, 0.67],
        [0.0, 1.0, 0.67, 0.33], 
        [1, 1, 1, 1]]

b_eq = [0.5, 0.5, 1.0]

c = [0, 0, -1.0, -1.0]

res1 = linprog(c=c, A_eq=a_eq, b_eq=b_eq, method='simplex')
print(res1)

The simplex solver has troubles with this.  It returns a sub-optimal solution with objective function value 0:

     fun: -0.0
 message: 'Optimization terminated successfully.'
     nit: 4
   slack: array([], dtype=float64)
  status: 0
 success: True
       x: array([ 0.5,  0.5,  0. ,  0. ])
     con: array([ -9.83213511e-13,  -9.83102488e-13,  -1.96620498e-12])

SciPy 1,0 has a new interior point solver [3] which seems a bit more robust:


res2 = linprog(c=c, A_eq=a_eq, b_eq=b_eq, method='interior-point')
print(res2)

     fun: -1.000000000000884
 message: 'Optimization terminated successfully.'
     nit: 4
   slack: array([], dtype=float64)
  status: 0
 success: True
       x: array([  5.41163802e-13,   5.41163802e-13,   5.00000000e-01,
         5.00000000e-01])

We see an optimal objective of -1. It also gives a warning:

D:\Anaconda3\lib\site-packages\scipy\optimize\_linprog_ip.py:623: OptimizeWarning: A_eq does not appear to be of full row rank. To improve performance, check the problem formulation for redundant equality constraints.
  warn(redundancy_warning, OptimizeWarning)

There is no good reason a Simplex solver should give incorrect results in case the A matrix is not full-rank. Usually solvers add a slack (often not physically) to each constraint, so A will never be rank deficient. The basis matrix can always be chosen such that it is both optimal and non-singular. Here is how an optimal basis (with objective -1) looks like for this problem provided by a different simplex solver:


                    LOWER          LEVEL          UPPER         MARGINAL

---- EQU e1          0.5000         0.5000         0.5000         EPS       'Non-basic'     
---- EQU e2          0.5000         0.5000         0.5000          .        'Basic'  
---- EQU e3          1.0000         1.0000         1.0000        -1.0000    'Non-basic'  

                    LOWER          LEVEL          UPPER         MARGINAL

---- VAR x0           .              .            +INF            1.0000    'Non-basic'     
---- VAR x1           .              .            +INF            1.0000    'Non-basic'  
---- VAR x2           .             0.5000        +INF             .        'Basic'  
---- VAR x3           .             0.5000        +INF             .        'Basic'  

We see equation 2 (or its slack) is made basic although the value of the slack is zero. This solution is actually both primal and dual degenerate. [4] A different solver gives:


                   LOWER          LEVEL          UPPER         MARGINAL

---- EQU e1         0.5000         0.5000         0.5000        -1.0000    'Non-basic'   
---- EQU e2         0.5000         0.5000         0.5000        -1.0000    'Non-basic'   
---- EQU e3         1.0000         1.0000         1.0000          .        'Basic'   

                   LOWER          LEVEL          UPPER         MARGINAL

---- VAR x0          .              .            +INF            1.0000    'Non-basic'    
---- VAR x1          .              .            +INF            1.0000    'Non-basic'   
---- VAR x2          .             0.5000        +INF             .        'Basic'   
---- VAR x3          .             0.5000        +INF             .        'Basic'   

Same primal solution, but different basis. No hint of dual degeneracy however. Bonus question: Which of these two simplex solutions is more correct?

Actually both are correct. From linear programming 101 we know  \(y = c_B^T B^{-1}\) where \(y\) are the duals.  We can calculate this quickly with some R. Suppose we have:

A
##    x0 x1        x2        x3 s1 s2 s3
## e1  1  0 0.3333333 0.6666667  1  0  0
## e2  0  1 0.6666667 0.3333333  0  1  0
## e3  1  1 1.0000000 1.0000000  0  0  1
c
## x0 x1 x2 x3 s1 s2 s3 
##  0  0 -1 -1  0  0  0 

Then we can do:

bs ← c('x2','x3','s2')
t(c[bs]) %*% solve(A[,bs])
##      e1 e2 e3
## [1,]  0  0 -1
bs ← c('x2','x3','s3')
t(c[bs]) %*% solve(A[,bs])
##      e1 e2 e3
## [1,] -1 -1  0

There is even a third optimal basis. It has the following duals

bs ← c('x2','x3','s1')
t(c[bs]) %*% solve(A[,bs])
##      e1 e2 e3
## [1,]  0  0 -1

In a sense we have three optimal solutions and SciPy's simplex solver is not able to find a single one of them.

Conclusion


In general the Simplex solver from scipy.optimize.linprog should be avoided. It fails on quite a few small models. The new interior-point solver available in SciPy 1.0 seems to behave much better.

References


Friday, December 8, 2017

Sparse Matrix export from GAMS to R


The problem

A colleague posed the question to me: how we can export a GAMS parameter to an R sparse matrix?. A sparse matrix is a matrix where most of the elements are zero. In large scale optimization, sparse matrices play an important role. Software (solvers, modeling systems) are exploiting sparsity to reduce memory requirements and to speed up calculations. In fact very large problems can only be solved by using sparse data structures. I actually don't think the user wants a sparse matrix, but let's assume we really want one. The R Matrix package [3] has facilities to create and manipulate sparse matrices.

We assume we have the following GAMS data:
set
   i
/a,b,c,d,e/
   j
/e,f,g,h,i,j/
;

table d(i,j)
   
e f g h i j
 
a  1   2   4
 
b
 
c      3
 
d
 
e          5
;
execute_unload "gamsdata";

Notice that will export three items: sets i and j and the parameter d.

The big issue here is that GAMS uses a sparse storage scheme throughout. This means only the nonzero elements of matrix d are stored. Unfortunately, this also means that empty columns and empty rows are discarded: they are simply not visible. We can see this when we display the matrix:

----     17 PARAMETER d  

            e           g           i

a       1.000       2.000       4.000
c                   3.000
e                               5.000

The trick is to export also the domains i and j. With the aid of these domain sets we can reconstruct the sparse matrix with named rows and columns and with all empty rows and columns preserved.

Although this example is a small matrix, we assume the matrix is very larger and sparse. This means we don't want to use in R a pivot operation on d to create a 2D dense matrix. In all operations in this write-up we store and operate on just the non-zero elements.

Below I show three different approaches to populate a sparse matrix in R with GAMS data. There are some interesting issues in each approach.

Method 1: GDXRRW and R


We start with reading in the data from the GAMS GDX file:


library(gdxrrw)
# read d,i,j into data frames
D <- rgdx.param("gamsdata","d")
I <- rgdx.set("gamsdata","i")
J <- rgdx.set("gamsdata","j")

When we print things we see they arrived as follows:
D
##   i j d
## 1 a e 1
## 2 a g 2
## 3 a i 4
## 4 c g 3
## 5 e i 5
I
##   i
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
J
##   i
## 1 e
## 2 f
## 3 g
## 4 h
## 5 i
## 6 j
Note that all these three identifiers represent data frames. Warning: note that the columns for the tables I and J are always called i. This is a quirk of gdxrrw.

We need to create mappings from name to index position for both the rows and columns. So "a" → 1, "b" → 2, etc. R has an interesting feature: named vectors allow us to index a vector by strings. We created these named vectors as follows:


# create named vectors for index positions
# this allows us to map from name -> index position
ordi <- seq(nrow(I))
names(ordi) <- I$i
ordj <- seq(nrow(J))
names(ordj) <- J$i

When we print these mapping vectors we see:
ordi
## a b c d e 
## 1 2 3 4 5
ordj
## e f g h i j 
## 1 2 3 4 5 6

Using these vectors we can easily convert columns i and j in the D data frame from strings to numbers.

ordi[ D$i ] # row indices
## a a a c e 
## 1 1 1 3 5
ordj[ D$j ] # column indices
## e g i g i 
## 1 3 5 3 5

With this we are ready to construct the sparse matrix in R:


library(Matrix)
# create sparse matrix
M <- sparseMatrix(
      i = ordi[ D$i ],      # row indices
      j = ordj[ D$j ],      # column indices 
      x = D$d,              # nonzero values
      dims = c(nrow(I),nrow(J)),  # size of matrix
      dimnames = list(I$i,J$i)    # names of rows and columns
  )
When we print this we see:
M
## 5 x 6 sparse Matrix of class "dgCMatrix"
##   e f g h i j
## a 1 . 2 . 4 .
## b . . . . . .
## c . . 3 . . .
## d . . . . . .
## e . . . . 5 .

In this method we used named vectors to be able to index vectors by strings. This trick helped to translate the strings forming the sets i and j into their ordinal values or index positions.

Method 2: Add some GAMS code


We can simplify the R code a bit by letting GAMS do part of the work. We can augment the matrix d with extra information about the index positions for non-zero values.

set v /value,ordi,ordj/;
parameter d2(i,j,v);
d2(i,j,
'value') = d(i,j);
d2(i,j,
'ordi')$d(i,j) = ord(i);
d2(i,j,
'ordj')$d(i,j) = ord(j);
display d2;

The displayed output looks like:

----     22 PARAMETER d2  

          value        ordi        ordj

a.e       1.000       1.000       1.000
a.g       2.000       1.000       3.000
a.i       4.000       1.000       5.000
c.g       3.000       3.000       3.000
e.i       5.000       5.000       5.000

Reading this data is basically the same as before:


# read d,i,j into data frames
D2 <- rgdx.param("gamsdata","d2")
I <- rgdx.set("gamsdata","i")
J <- rgdx.set("gamsdata","j")

The D2 data frame looks like:

D2
##    i j     v d2
## 1  a e value  1
## 2  a e  ordi  1
## 3  a e  ordj  1
## 4  a g value  2
## 5  a g  ordi  1
## 6  a g  ordj  3
## 7  a i value  4
## 8  a i  ordi  1
## 9  a i  ordj  5
## 10 c g value  3
## 11 c g  ordi  3
## 12 c g  ordj  3
## 13 e i value  5
## 14 e i  ordi  5
## 15 e i  ordj  5

The row and columns indices and the values can be extracted by standard subsetting:


# subset D2 to extract ordi, ordj and value data
ordi <- D2[D2$v=="ordi","d2"]
ordj <- D2[D2$v=="ordj","d2"]
v <- D2[D2$v=="value","d2"]

The sparse matrix is now easy to create:

# create sparse matrix
m <- sparseMatrix(
      i = ordi,        # row indices
      j = ordj,        # column indices
      x = v,           # nonzero values
      dims = c(nrow(I),nrow(J)),    # size of matrix
      dimnames = list(I$i,J$i)      # names of rows and columns  
  )

Output is as expected:
m
## 5 x 6 sparse Matrix of class "dgCMatrix"
##   e f g h i j
## a 1 . 2 . 4 .
## b . . . . . .
## c . . 3 . . .
## d . . . . . .
## e . . . . 5 .

Method 3: Using gdxrrw smarter


As pointed out in the comments, the gdxrrw tool can actually returns all the information we need:


library(Matrix)
library(gdxrrw)
d <- rgdx('gamsdata',list(name='d')) 
M2 <- sparseMatrix(
  i = d$val[,1], # row indices
  j = d$val[,2], # column indices 
  x = d$val[,3], # nonzero values
  dims = c(length(d$uels[[1]]),length(d$uels[[2]])), # size of matrix
  dimnames = d$uels # names of rows and columns
)

Method 4: SQLite and R


Here we use SQLite as an intermediate data store. This will allow us to do some SQL joins while importing the data. First we convert the GAMS data into SQLite database. This is very easy:


system("gdx2sqlite -i gamsdata.gdx -o gamsdata.db")

The reading of the data is straightforward:

library(RSQLite)
sqlite<-dbDriver("SQLite") 
db <- dbConnect(sqlite,"gamsdata.db")
D <- dbReadTable(db,"d")
I <- dbReadTable(db,"i")
J <- dbReadTable(db,"j")

The data arrives as:

D
##   i j value
## 1 a e     1
## 2 a g     2
## 3 a i     4
## 4 c g     3
## 5 e i     5
I
##   i
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
J
##   j
## 1 e
## 2 f
## 3 g
## 4 h
## 5 i
## 6 j

The row numbers in an SQLite table can be retrieved by the rowid. For example:
dbGetQuery(db,"select rowid,* from i")
##   rowid i
## 1     1 a
## 2     2 b
## 3     3 c
## 4     4 d
## 5     5 e

Note: the rowid is automatically incremented by SQLite when adding rows. However, after deleting rows the rowid has no longer a direct relation to the row number. Our data base is newly created, so in our case we can assume the rowid is the same as the row number.

We use this to map from index names for the sets i and j to their ordinal numbers. We do this by using a join on tables d and i (or d and j).

The row and column indices can be retrieved as:

ordi <- dbGetQuery(db,"select i.rowid from d inner join i on i.i=d.i")[[1]]
ordj <- dbGetQuery(db,"select j.rowid from d inner join j on j.j=d.j")[[1]]

These vectors look like:

ordi
## [1] 1 1 1 3 5
ordj
## [1] 1 3 5 3 5

By now creating the sparse matrix is simple:

m <- sparseMatrix(
      i = ordi,     # row indices
      j = ordj,     # column indices
      x = D$value,  # nonzero values
      dims = c(nrow(I),nrow(J)),    # size of matrix
      dimnames = list(I$i,J$j)      # names of rows and columns 
  )

The output is no surprise:
m
## 5 x 6 sparse Matrix of class "dgCMatrix"
##   e f g h i j
## a 1 . 2 . 4 .
## b . . . . . .
## c . . 3 . . .
## d . . . . . .
## e . . . . 5 .

Method 4: Pivoting data frames


It is my assumption the user just wanted to pivot from "long" format to "wide" format [1]. This is efficiently done with the function spread [2]:

D <- rgdx.param("gamsdata","d")
library(tidyr)
P <- spread(D,j,d,fill=0)

This yields:
D
##   i j d
## 1 a e 1
## 2 a g 2
## 3 a i 4
## 4 c g 3
## 5 e i 5
P
##   i e g i
## 1 a 1 2 4
## 2 c 0 3 0
## 3 e 0 0 5

If you don't specify the fill parameter, spread will impute NA values. Note this operation is dense: all zeros are explicitly stored.

The duplication in column names can be easily fixed by something like:
colnames(P)][1]=""
P
##     e g i
## 1 a 1 2 4
## 2 c 0 3 0
## 3 e 0 0 5

If we want to keep the empty rows and columns from the original matrix, it is easiest to fix this at the GAMS level:

parameter d3(i,j) 'dense version';
d3(i,j) = d(i,j) +
eps;
display d3;

The addition of the special value eps to a number is somewhat special in GAMS. We have:

\[x+\text{eps} = \begin{cases}x&\text{if }x\ne 0\\ \text{eps}&\text{if }x=0\end{cases} \]

This trick can be used to preserve all zero (and non-zero) values. Indeed, the values of d3 are:

----     26 PARAMETER d3  dense version

            e           f           g           h           i           j

a       1.000         EPS       2.000         EPS       4.000         EPS
b         EPS         EPS         EPS         EPS         EPS         EPS
c         EPS         EPS       3.000         EPS         EPS         EPS
d         EPS         EPS         EPS         EPS         EPS         EPS
e         EPS         EPS         EPS         EPS       5.000         EPS


We can import this into R as follows:

D3 <- rgdx.param("gamsdata","d3",squeeze=F)
P <- spread(D3,j,d)

The squeeze option helps to deal with the eps values: we have to make sure that we map eps back to a physical zero. The results look like:

P
##   i e f g h i j
## 1 a 1 0 2 0 4 0
## 2 b 0 0 0 0 0 0
## 3 c 0 0 3 0 0 0
## 4 d 0 0 0 0 0 0
## 5 e 0 0 0 0 5 0

Conclusion 


Tons of ways to do this. Using rgdxrrw::rgdx the smart way is the simplest.

References


  1. Pivoting a table: GAMS, SQLite, R and Python, http://yetanothermathprogrammingconsultant.blogspot.com/2016/01/pivoting-table-gams-sqlite-r-and-python.html
  2. Brad Boehmke, Data Processing with dplyr & tyidr, https://rpubs.com/bradleyboehmke/data_wrangling
  3. Martin Maechler and Douglas Bates, 2nd Introduction to the Matrix package, https://cran.r-project.org/web/packages/Matrix/vignettes/Intro2Matrix.pdf

Monday, December 4, 2017

Using Nonlinear Mixed-integer Programming

In [1] we explored some linear mixed-integer programming formulations for the Least Median of Squares regression problem. Now let’s look at some nonlinear formulations.
Quadratic model

A mixed-integer quadratically constrained formulation can look like [1]:

\[\begin{align}\min\>&z\\&z\ge r^2_i-M\delta_i\\ &\sum_i \delta_i = n-h\\&r_i = y_i-\sum_j X_{i,j} \beta_j\\&\delta_i \in \{0,1\}\end{align}\]

This MIQCP model is convex and can be solved with top-of-the-line commercial solvers like Cplex and Gurobi. Some timings I see are:

----    145 PARAMETER report 

                  obj        time       nodes

big-M MIP       0.262      30.594  160058.000
MIQCP           0.069     712.156 1783661.000

The objectives look different but they are actually correct: in the MIP formulation the objective reflects \(|r|_{(h)}\) while the quadratic formulation uses \(r^2_{(h)}\).

This performance differential is not unusual. If there are good MIP formulations available, in most cases quadratic formulations are not competitive.

MINLP model

The MINLP model from [1]:

\[\begin{align}\min\>&z\\&z\ge \delta_i \cdot r^2_i\\ &\sum_i \delta_i = h\\&r_i = y_i-\sum_j X_{i,j} \beta_j\\&\delta_i \in \{0,1\}\end{align}\]

is more problematic. This is no longer convex, so we need a global solver. When we try to solve this model as is we are in a heap of problems:

----    193 PARAMETER report 

                      obj        time       nodes   modelstat

MINLP/Couenne       0.452       5.562     261.000       1.000 (optimal)
MINLP/Baron         0.225    1000.000   71811.000       8.000 (integer solution, exceeded time limit)

Obviously, the Couenne solution is not correct, and Baron is not doing so great either. Global solvers have troubles when variables do not have proper bounds. It would be good it all kind of alarm bells start ringing, but they don’t.

Let’s add the following bounds:

\[\begin{align}&-100\le \beta_j \le 100\\&-100\le r_i \le 100\end{align}\]

Of course in practice you may not have good bounds available. These bounds yield much better results:

----    193 PARAMETER report 

                      obj        time       nodes   modelstat

MINLP/Couenne       0.069     108.875   10827.000       1.000
MINLP/Baron         0.069     170.890      77.000       1.000

This is actually quite good compared to the MIQCP model.

Initial solutions

Recently I was asked about the effect of an initial point on the behavior of Couenne. Let’s try the experiment where we do two solves in sequence. The result is:

----    181 PARAMETER report 

                  obj        time       nodes   modelstat

Couenne.1       0.069     108.469   10827.000       1.000
Couenne.2       0.069     113.359   11782.000       1.000

This is surprising: the second solve is actually more expensive! I have no good explanation for this. May be the initial point only affected the first sub-problem and in a bad way. Now try Baron:

----    181 PARAMETER report 

                obj        time       nodes   modelstat

Baron.1       0.069     185.600      77.000       1.000
Baron.2       0.069      94.980      29.000       1.000

This looks much better. In addition the Baron log gives some clear messages about the initial point being used as an incumbent:

Starting solution is feasible with a value of    0.686748259896E-001
  Doing local search
  Solving bounding LP
  Starting multi-start local search
  Done with local search
===========================================================================
   Iteration    Open nodes         Time (s)    Lower bound      Upper bound
           1             1             8.25      0.00000         0.686748E-01
           3+            3            36.98      0.00000         0.686748E-01
           7+            3            66.42      0.00000         0.686748E-01
          29             0            94.98     0.686747E-01     0.686747E-01

Cleaning up

We can see the effect of the initial point: the upper bound is immediately equal to the objective value of the initial point.

Setting a bound

What about instead of providing an initial point, just set a bound on \(z\):

\[z\le 0.069\]

This looks like a good strategy to help the solver.

----    193 PARAMETER report 

                obj        time       nodes   modelstat

Couenne       0.069     130.125   10104.000       1.000
Baron            NA     115.390      15.000      19.000 (infeasible, no solution)

Bummer: it does not help at all.

Solving global MINLP models is not yet as “automatic” as solving an LP. You need to be aware: solvers for these type of models are a little bit more fragile. A little bit of a sobering conclusion.

References
  1. Integer Programming and Least Median of Squares Regression, http://yetanothermathprogrammingconsultant.blogspot.com/2017/11/integer-programming-and-least-median-of.html