## Monday, April 17, 2017

### MIP model in R using the OMPR package

A small model with data in R is a good opportunity to try out the OMPR modeling tool (2). Also a first time for me to see the solver Symphony (3) in action.

## Problem Description

From (1) (see references below):

I have 4 stores (1,2,3,4) and I can apply 3 treatments (A,B,C) to each of the 4 stores. Each treatment has its own cost and profit.

The matrix is as follows:

Store   Treatment   Cost    Profit
1   A   50  100
1   B   100 200
1   C   75  50
2   A   25  25
2   B   150 0
2   C   50  25
3   A   100 300
3   B   125 250
3   C   75  275
4   A   25  25
4   B   50  75
4   C   75  125

How can I maximize the profit having a constraint on maximum cost in R? In reality, I have a large number of stores and hence cannot be done manually. Each store can get only 1 treatment.

## Optimization Model

The optimization model for this problem can look like:

Loading the data into a data frame is easy with read.table:

df<-read.table(text="
Store   Treatment   Cost    Profit
1   A   50  100
1   B   100 200
1   C   75  50
2   A   25  25
2   B   150 0
2   C   50  25
3   A   100 300
3   B   125 250
3   C   75  275
4   A   25  25
4   B   50  75
4   C   75  125
df
##    Store Treatment Cost Profit
## 1      1         A   50    100
## 2      1         B  100    200
## 3      1         C   75     50
## 4      2         A   25     25
## 5      2         B  150      0
## 6      2         C   50     25
## 7      3         A  100    300
## 8      3         B  125    250
## 9      3         C   75    275
## 10     4         A   25     25
## 11     4         B   50     75
## 12     4         C   75    125

## OMPR Model

I am going to try to model this problem with OMPR (2) and solve it with the Symphony MIP solver (3).

First we need a bunch of packages:

library(dplyr)
library(tidyr)
library(ROI)
library(ROI.plugin.symphony)
library(ompr)
library(ompr.roi)

## Extract data

OMPR is using numbers as indices so it makes sense to extract the cost and profit data as matrices. We do this as follows:

# extract some basic data
stores<-unique(df$Store) stores ## [1] 1 2 3 4 treatments <- levels(df$Treatment)
treatments
## [1] "A" "B" "C"
num_treatments <- length(treatments)
# cost matrix
cost
##     A   B  C
## 1  50 100 75
## 2  25 150 50
## 3 100 125 75
## 4  25  50 75
# profit matrix
profit
##     A   B   C
## 1 100 200  50
## 2  25   0  25
## 3 300 250 275
## 4  25  75 125
# maximum cost allowed
max_cost <- 300

## MIP Model

The OMPR package allows us to specify linear equations algebraically. Most LP/MIP solvers in R use a matrix notation, which often is more difficult to use. Here we go:

m <- MIPModel() %>%
add_constraint(sum_expr(cost[i,j]*x[i,j], i=stores, j=1:num_treatments) <= max_cost) %>%
set_objective(sum_expr(profit[i,j]*x[i,j], i=stores, j=1:num_treatments),"max") %>%
solve_model(with_ROI(solver = "symphony",verbosity=1))
##
## Starting Preprocessing...
## Preprocessing finished...
##       with no modifications...
## Problem has
##   5 constraints
##   12 variables
##   24 nonzero coefficients
##
## Total Presolve Time: 0.000000...
##
## Solving...
##
## granularity set at 25.000000
## solving root lp relaxation
## The LP value is: -650.000 [0,5]
##
##
## ****** Found Better Feasible Solution !
## ****** Cost: -650.000000
##
##
## ****************************************************
## * Optimal Solution Found                           *
## * Now displaying stats and best solution found...  *
## ****************************************************
##
## ====================== Misc Timing =========================
##   Problem IO        0.000
## ======================= CP Timing ===========================
##   Cut Pool                  0.000
## ====================== LP/CG Timing =========================
##   LP Solution Time          0.000
##   LP Setup Time             0.000
##   Variable Fixing           0.000
##   Pricing                   0.000
##   Strong Branching          0.000
##   Separation                0.000
##   Primal Heuristics         0.000
##   Communication             0.000
##   Total User Time              0.000
##   Total Wallclock Time         0.000
##
## ====================== Statistics =========================
## Number of created nodes :       1
## Number of analyzed nodes:       1
## Depth of tree:                  0
## Size of the tree:               1
## Number of solutions found:      1
## Number of solutions in pool:    1
## Number of Chains:               1
## Number of Diving Halts:         0
## Number of cuts in cut pool:     0
## Lower Bound in Root:            -650.000
##
## ======================= LP Solver =========================
## Number of times LP solver called:                 1
## Number of calls from feasibility pump:            0
## Number of calls from strong branching:            0
## Number of solutions found by LP solve:            1
## Number of bounds changed by strong branching:     0
## Number of nodes pruned by strong branching:       0
## Number of bounds changed by branching presolver:  0
## Number of nodes pruned by branching presolver:    0
##
## ==================== Primal Heuristics ====================
##                              Time      #Called   #Solutions
## Rounding I                   0.00
## Rounding II                  0.00
## Diving                       0.00
## Feasibility Pump             0.00
## Local Search                 0.00            1            0
## Restricted Search            0.00
## Rins Search                  0.00
## Local Branching              0.00
##
## =========================== Cuts ==========================
## Accepted:                         0
## Deleted from LPs:                 0
## Removed because of bad coeffs:    0
## Removed because of duplicacy:     0
## Insufficiently violated:          0
## In root:                          0
##
## Time in cut generation:              0.00
## Time in checking quality and adding: 0.00
##
##                    Time     #Called     In Root       Total
## Gomory             0.00
## Knapsack           0.00
## Clique             0.00
## Probing            0.00
## Flowcover          0.00
## Twomir             0.00
## Oddhole            0.00
## Mir                0.00
## Rounding           0.00
## LandP-I            0.00
## LandP-II           0.00
## Redsplit           0.00
##
## ===========================================================
## Solution Found: Node 0, Level 0
## Solution Cost: -650.0000000000
## +++++++++++++++++++++++++++++++++++++++++++++++++++
## User indices and values of nonzeros in the solution
## +++++++++++++++++++++++++++++++++++++++++++++++++++
##       1 1.0000000000
##       2 1.0000000000
##       4 1.0000000000
##      11 1.0000000000
## 

## Solution

A “raw" solution can be generated using:

get_solution(m,x[i, j]) %>%
filter(value > 0)
##   variable i j value
## 1        x 2 1     1
## 2        x 3 1     1
## 3        x 1 2     1
## 4        x 4 3     1

We want to clean this up a bit:

cat("Status:",solver_status(m),"\n")
cat("Objective:",objective_value(m),"\n")
get_solution(m,x[i, j]) %>%
filter(value > 0) %>%
mutate(Treatment = treatments[j],Store = i) %>%
select(Store,Treatment)
## Status: optimal
## Objective: 650
##   Store Treatment
## 1     2         A
## 2     3         A
## 3     1         B
## 4     4         C
##### References
1. The original problem is from: Mixed linear integer optimization - Optimize Profit based on different treatments to all the stores in R, http://stackoverflow.com/questions/43446194/optimize-profit-based-on-different-treatments-to-all-the-stores-in-r
2. Dirk Schumacher, OMPR: R package to model Mixed Integer Linear Programs, https://github.com/dirkschumacher/ompr
3. Symphony MIP Solver: https://projects.coin-or.org/SYMPHONY
4. Paul Rubin, MIP Models in R with OMPR, http://orinanobworld.blogspot.com/2016/11/mip-models-in-r-with-ompr.html

## Thursday, April 13, 2017

### QP as LP: cutting planes

In (1) I described a simple linearization scheme for a QP model we we can solve it as a straight LP. For a simple (convex) quadratic function $$f(x)$$, instead of solving:

 $\min\> f(x)=x^2$

we solve

 \begin{align} \min\>&z\\ &z \ge a_i x + b_i\end{align}

In this post I do things slightly different: instead of adding the linear inequalities ahead of time, we add them one at the time based on the previously found optimal point. This approach is called a cutting plane technique (2).

##### Example: portfolio model

We consider again the simple portfolio model:

 \boxed{\begin{align} \min \>& \sum_t w_t^2\\ & w_t = \sum_i r’_{i,t} x_i\\ & \sum_i \mu_i x_i \ge R\\ & \sum_i x_i = 1\\ &x_i\ge 0\end{align}}

The model is initially linearized as an LP model:

 \boxed{\begin{align}\min \>& \sum_t z_t\\ & w_t = \sum_i r’_{i,t} x_i\\ & \sum_i \mu_i x_i \ge R\\ & \sum_i x_i = 1\\ &x_i, z_t\ge 0\end{align}}

Later on, during the CP algorithm, we will add linear cuts of the form

 $z_t \ge a_t w_t + b_t$

The algorithm would look like:

1. $$k:=1$$
2. Solve model LP, let $$w^*_t$$ be the optimal values for $$w_t$$.
3. if $$k=$$MAXIT: STOP
4. Add the constraint $$z_t \ge a_t w_t + b_t$$ where $$a_t=2 w^*_t$$ and $$b_t=-(w^*_t)^2$$. Note that we add one cut for each $$w_t$$ here (our dataset has $$T=717$$ time periods).
5. $$k:=k+1$$
6. go to step 2

Here we stop simply when a certain number of iterations MAXIT has been exceeded. That can be refined by stopping when the objective does not seem to change much anymore. Another optimization could be to only add cuts that are different from the ones added before (for some $$t$$ we may converge quicker than for others).

The algorithm converges very fast:

 ----    118 PARAMETER report2             obj(LP)         w'w iter1               1.02907546iter2   0.21577087  0.46879016iter3   0.33210537  0.48432099iter4   0.37143835  0.41397603iter5   0.40990988  0.41362293iter6   0.41109694  0.41222051iter7   0.41183602  0.41212331iter8   0.41192720  0.41204267iter9   0.41196991  0.41204112iter10  0.41197620  0.41203782

Note that the optimal QP solution has an objective: 0.41202816.

This is pretty good performance especially because the dataset is not very small (it has 717 time periods and 83 stocks).

Here is a picture of the cuts introduced for the first element $$z_1=w_1^2$$:

##### A combined approach

We can even combine the two methods:

1. Start with a coarse-grained (i.e. cheap) initial set of cuts. In (1) we used $$n=10$$ inequalities per $$w_t$$. For this experiment I reduced this to $$n=5$$.
2. Then apply our cutting plane algorithm. Instead of MAXIT=10 iterations we now do 5 iterations.

This yields even faster convergence:

 ----    142 PARAMETER report3            obj(LP)         w'w iter1  0.32921509  0.41401219iter2  0.40812966  0.41471624iter3  0.41017228  0.41211351iter4  0.41188869  0.41208158iter5  0.41195624  0.41203588
##### References
1. QP as LP: piecewise linear functions, http://yetanothermathprogrammingconsultant.blogspot.com/2017/04/qp-as-lp-piecewise-linear-functions.html
2. J. E. Kelley, Jr. "The Cutting-Plane Method for Solving Convex Programs." J. Soc. Indust. Appl. Math. Vol. 8, No. 4, pp. 703-712, December, 1960.
3. Cardinality Constrained Portfolio Optimization: MIQP without MIQP Solver, http://yetanothermathprogrammingconsultant.blogspot.com/2016/02/cardinality-constrained-portfolio.html. Here this cutting plane method is applied to a MIQP model (not strictly “allowed” as this is no longer convex, but useful as a heuristic).

### QP as LP: piecewise linear functions

I am facing a (convex) QP problem that has some numerical issues for some data sets. Hence as backup I’d like to be able to solve the problem as an LP. One way to do this is using a simple piecewise linear approximation.

Consider the objective function $$\min\> f(x)=x^2$$. Instead of solving this directly we can invent a few linear functions $$y = a_i x + b_i$$ and approximate the problem by:

 \begin{align} \min\>&z\\ &z \ge a_i x + b_i\end{align}

Here is an example how these linear functions can look like:

To derive a single linear function we observe the following:

1. Pick a point $$t_i$$, e.g. $$t_i=2$$ in the picture below.
2. We want the function value to match: $$f(t_i) = a_i t_i + b_i$$. I.e. $$a_i t_i + b_i=t_i^2$$.
3. We also want the slope to match: $$\nabla f(t_i) = a_i$$. I.e. $$a_i = 2 t_i$$.

This means we can choose:

 \begin{align} &a_i = 2 t_i\\ &b_i = –t_i^2\end{align}

The net effect of this strategy is that we formed a piecewise linear objective that approximates our original quadratic function:

Note that there are other linear approximation schemes. This is just a particularly simple one.

##### Example: portfolio optimization

To try this out on a more realistic problem, we consider a standard portfolio optimization problem:

 \boxed{\begin{align} \min \>& \sum_t w_t^2\\ & w_t = \sum_i r’_{i,t} x_i\\ & \sum_i \mu_i x_i \ge R\\ & \sum_i x_i = 1\\ &x_i\ge 0\end{align}}

Here $$r’=r-\mu$$ are the mean adjusted returns.$$R$$ is the minimum expected portfolio return.

In this case our linear model can look like:

 \boxed{\begin{align} \min \>& \sum_t z_t\\ & z_t \ge a_{t,j} w_t + b_{t,j}\>\forall t,j\\ & w_t = \sum_i r’_{i,t} x_i\\ & \sum_i \mu_i x_i \ge R\\ & \sum_i x_i = 1\\ &x_i\ge 0\end{align}}

The $$w_t$$’s vary from $$L_t=\displaystyle\min_i r’_{i,t}$$ to $$U_t=\displaystyle\max_i r’_{i,t}$$. For each $$w_t$$ we choose $$n=10$$ equidistant points between $$[L_t,U_t]$$. We use the FTSE100 dataset from (1), which has:

• number of time periods $$t$$: 717
• number of stocks $$i$$: 83

We see the following results:

• The optimal QP objective is 0.41202816
• The optimal LP objective is 0.37728185

This looks bad, but actually things are better than they appear. When we evaluate $$\displaystyle\sum_i w_t^2$$ for the optimal LP values of $$w_t$$ we have:

• QP objective of LP solution: 0.41246149

Here are the differences in the selected portfolio:

 ----     87 PARAMETER report  QP objective 0.41202816,    LP objective 0.37728185,    w'w from LP  0.41246149 ----     87 PARAMETER xval                   QP          LP stock2   0.11723384  0.10298780stock11  0.13629495  0.14992714stock22  0.06253534  0.06202882stock40  0.09082870  0.10523583stock64  0.08521849  0.08348895stock66  0.17247021  0.17267818stock69  0.00180266stock78  0.09462530  0.09761814stock79  0.05184550  0.04383311stock83  0.18714501  0.18220203
##### Performance

Although the LP model is much larger, it solves in about the same time as the original QP. I see:

Note that barrier iterations should really not be compared to simplex iterations. A finer or more coarse grid in the piecewise linear approximation has only affect on the the number of equations in the model (the number of variables will stay the same).

##### References
1. Renato Bruni, Francesco Cesarone, Andrea Scozzari, Fabio Tardella, Real-world datasets for portfolio selection and solutions of some stochastic dominance portfolio models, Data in Brief, Volume 8, September 2016, Pages 858-862
2. QP as LP: cutting planes, http://yetanothermathprogrammingconsultant.blogspot.com/2017/04/qp-as-lp-cutting-planes.html

## Wednesday, April 5, 2017

### MIP Modeling: preventing a bit-pattern

From (1):

Let us say we have $$n$$ binary variables $$x_i$$ for all $$i=1,2,…,n$$ i.e., $$x_i \in \{0,1\}$$ for all $$i=1,2,…,n$$.

I need to write the following constraint:

• If $$x_i=1$$ and $$x_{i+2}=1$$, then $$x_{i+1}=1$$. In other words, the variables $$x_i$$ must be consecutives.

Basically we want to prevent the sequence $$[1,0,1]$$ but allow all other patterns. This can be accomplished with the constraint:

 $\boxed{x_i-x_{i+1}+x_{i+2} \le 1}$

This is actually a special case of (2). Of course for this small example we can inspect all possible combinations (there are $$2^3=8$$ of them) to verify we do the right thing:

##### Counting Start-ups

What about if you want to have all your $$x_i=1$$ consecutive. I.e. we only allow something like $$[0,0,1,1,1,1,0,0,0]$$ where there are no holes in the series of ones. This can be best modeled by counting the number of times we go from $$0$$ to $$1$$. A compact formulation is:

 \boxed{\begin{align}&s_i \ge x_i – x_{i-1}\\ &\sum_i s_i \le 1\\ &s_i \in \{0,1\}\end{align}}

Here we force $$s_i=1$$ if we go from $$0$$ to $$1$$. Note that we can relax $$s_i$$ to be continuous (i.e. $$s_i \in [0,1]$$).

This example would show:

 $\begin{array}{cccccccccc}x_i=&[0&0&1&1&1&1&0&0&0]\\ s_i=&[0&0&1&0&0&0&0&0&0]\end{array}$
##### References
1. How to model a consecutive binary constraint? http://math.stackexchange.com/questions/2218389/how-to-model-a-consecutive-binary-constraint
2. Integer Cuts, http://yetanothermathprogrammingconsultant.blogspot.com/2011/10/integer-cuts.html
3. Modeling number of machine start-ups, http://yetanothermathprogrammingconsultant.blogspot.com/2014/12/modeling-number-of-machine-start-ups.html

## Tuesday, April 4, 2017

### Satalia SolveEngine

From the website (1), I am not sure what this is, although the soundbite is intriguing:

The SolveEngine is a cloud-based system, similar to NEOS except with more powerful solvers and much shorter queue times.

Note that NEOS has all the major commercial LP/MIP solvers: Cplex, Gurobi, Xpress and Mosek. I am curious what these more powerful solvers are.

##### References
1. Satalia SolverEngine, https://solve.satalia.com

## Thursday, March 30, 2017

### R: plotly and R Markdown

In (1) I showed how we can produce a dynamic Gantt chart in R using the plotly package. What about if we put this in an R markdown document? If we knit the document to HTML there is no problem. We see something like:

Now the question arises what happens when I produce a PDF (through LaTeX)? Interestingly that works: we get a static picture:

Obviously we cannot run JavaScript based interaction in a PDF file. Same thing with Word output:

##### References
1. Gantt chart with R/plotly, http://yetanothermathprogrammingconsultant.blogspot.com/2017/03/gantt-chart-with-rplotly.html

## Friday, March 24, 2017

### Cutting Stock Problem: Generating all patterns by MIP Solution Pool

In (1) a small cutting stock problem is described. We need to cut up raw material rolls of width 5600 mm into the following pieces:

In this post I want to explore how we can use the solution pool technology in Cplex and Gurobi to do some of the heavy lifting for us.

##### Generating all possible patterns

From (1):

There are 308 possible patterns for this small instance.

Enumerating those patterns can be done by writing your own algorithm in your favorite programming language. Alternatively we can develop a small MIP model and let the solution pool do the job for us. See (2) for another example of this.

A minimalistic model is:

 \boxed{\begin{align}\min\>&0\\ &L \le \sum_j w_j y_j \le R\\ &y_j \in \{0,1,2,\dots,q_j\}\end{align}}

were $$w_j,q_j$$ are the width and rolls from the demand table above, $$R=5600$$, and $$L=\displaystyle\min_j w_j$$.

There is a dummy objective: we are just looking for feasible solutions. The lower limit of $$L$$ is there to exclude a pattern with all $$y_j=0$$.

The solution pool will be able to find all feasible integer solutions for this model very fast.

When we solve this with GAMS/Cplex we see:

 Total (root+branch&cut) =    0.13 sec. (4.46 ticks)Dumping 308 solutions from the solution pool...

This model is probably shorter than anything we can program in a standard programming language.

Now we have a collection of all patterns $$a_{i,j}$$ formed by collecting all $$y_j$$ we found. Note that we enumerated literally all possible patterns including things like:

E.g. pattern 1 has only one item: a single roll with a width of 1380 mm.

##### Solving the cutting stock problem: minimize waste

The subsequent cutting stock problem can be formulated as (1):

 \boxed{\begin{align}\min\>&\sum_i c_i x_i\\ &\sum_i a_{i,j} x_i \ge q_j\>\>\forall j\\ &x_i \in \{0,1,2,\dots\}\end{align}}

The cost coefficients $$c_i$$ correspond to cost or waste of each pattern $$i$$, i.e. $$c_i = R-\displaystyle\sum_j w_j a_{i,j}$$.

Actually this model is not completely correct. Some of the patterns have a zero waste, E.g.:

These two patterns have a total width of $$1520+1880+2200=1520+1930+2150=5600$$. As a result the model can use any amount of these patterns without cost. Indeed I saw a strange solution:

 ----    109 VARIABLE x.L  p18     14,    p38     10,    p40      4,   p68  99999,    p70     16,    p81  99999,    p142     4,    p159     8p187     2

Indeed these two "zero waste" patterns have bad values.

We can correct this easily by changing the model to:

 \boxed{\begin{align}\min\>&\mathit{waste} = \sum_i c_i x_i\\ &\sum_i a_{i,j} x_i = q_j\>\>\forall j\\ &x_i \in \{0,1,2,\dots\}\end{align}}

i.e. use an equality constraint. Now we get a proper solution:

 ----    109 VARIABLE x.L  p18   6,    p30   1,    p38   8,    p40   7,    p68   5,    p70  15,    p75   1,    p81   4,    p121  4,    p142  7p156  5,    p228  6,    p277  1,    p282  3 ----    114 PARAMETER n                    =           73  total amount of stock used ----    118 PARAMETER waste_percent        =        0.401  waste as percentage of material

Here we calculated:

 \begin{align}&n=\sum_i x^*_i\\ &\mathit{waste}\% = 100 \frac{\mathit{waste}^*}{n\cdot R}\end{align}

The numbers for total used raw material and percentage waste are the same as mentioned in (1).

##### Multiple solutions

The optimal combination of patterns to cut is not unique. One interesting subset of solutions is the one with a minimum number of different patterns. One way to find this solution is by solving a subsequent model, with the optimal amount of raw as constraint:

 \boxed{\begin{align}\min\>&\mathit{numPatterns} = \sum_i \delta_i\\ &\sum_i a_{i,j} x_i = q_j\\ &\sum_i x_i = n\\ &x_i \le n \delta_i\\ &\delta_i \in \{0,1\}\\ &x_i \in \{0,1,2,\dots\}\end{align}}

This gives indeed what we want: 10 different patterns:

 ----    139 VARIABLE numPatterns.L         =           10  ----    139 VARIABLE x.L  p40   7,    p43   3,    p68   8,    p70  11,    p81   6,    p115 10,    p123  2,    p142 12,    p228 12,    p280  2

What about generating all possible optimal solutions for this? In (1) it is stated there are 19 solutions with 10 different patterns. Again we can use the solution pool for this. We see:

 Total (root+branch&cut) =    3.22 sec. (794.99 ticks)Dumping 19 solutions from the solution pool...
##### Minimize number of raw material rolls

Instead of minimizing waste we can also just minimize the number of raw material rolls. This corresponds to $$c_i=1$$. Solving the cutting stock model with this objective gives the same result.

In this case we can actually make the model much smaller by only considering “full” patterns. That is, only consider patterns that have no extra space left for another item from our demand data set.

From our original example:

we only want to keep rows that are non-dominated (the pattern numbers are reordered, so they don’t correspond):

The model to generate these can be written as:

 \boxed{\begin{align}\min\>&0\\ &r = R-\sum_j w_j y_j\\ &y_j \ge (1-\delta_j) q_j&\forall j\\ &r \le w_j \delta_j + R(1- \delta_j)&\forall j\\ &r \ge 0\\ &\delta_j \in \{0,1\}\\ &y_j \in \{0,1,2,\dots,q_j\}\end{align}}

The binary variable $$\delta_j\in\{0,1\}$$ is forced to assume the value of one when we still have a demand left for another item $$j$$. We require that the remaining space available is smaller than the space needed to place any remaining demand.

Note that the second constraint $$y_j \ge (1-\delta_j) q_j$$ implements the implication:

 $y_j while the third constraint $$r \le w_j \delta_j + R(1- \delta_j)$$ yields:  \[0 \le r \le \min_{j|\delta_j=1}w_j$

and so forcing the remaining space $$r$$ to be smaller than any demanded item we still can assign. When we solve this model with the solution pool we find just 213 different patterns (compared to 308 before). The subsequent cutting stock model can look like:

 \boxed{\begin{align}\min\>&n=\sum_i x_i\\ &\sum_i a_{i,j} x_i \ge q_j\>\>\forall j\\ &x_i \in \{0,1,2,\dots\}\end{align}}

We find the same $$n=73$$ as before:

 ----    199 PARAMETER n                    =           73  total amount of stock used ----    199 VARIABLE x.L  p21  16,    p26   6,    p48   3,    p94   8,    p102  3,    p106 12,    p117  2,    p124  7,    p131  1,    p135  1p188 12,    p208  2,    p228  1,    p280  2
##### Conclusion

The solution pool technique can be a useful tool for smaller instances of cutting stock problems where we want to enumerate patterns or optimal solutions. Of course for large problems these enumeration steps are not practical.

##### References
1. Cutting stock problem, https://en.wikipedia.org/wiki/Cutting_stock_problem
2. Employee Scheduling III: generating all shifts using the Cplex solution pool, http://yetanothermathprogrammingconsultant.blogspot.com/2017/01/employee-scheduling-iii-generating-all.html

## Tuesday, March 21, 2017

### Joins in GAMS, SQL or R

To produce the charts in (1) I needed to do a multiple join on some data. From the GAMS model in (2) we extract the the following data:

 sets  j 'tasks'  /job1*job10/  t 'stages' /t1*t10/;parameters   proctime(j,t) 'processing times for each stage'   machine(j,t) 'machine numbers for each stage (zero based)';variable x(j,t) 'start time of subtask';

What we want to arrive in R is the following:

i.e. a single data frame with slightly renamed columns (and a new column called end indicating when a sub-job finishes).

I used “approach 1” (pure SQL, described next) in (1), but there are other ways to attack the problem. I’ll describe a few.

##### Approach 1: pure SQL

Writing this into a SQLite database is very simple:

 execute_unload "ft10";execute "gdx2sqlite -i ft10.gdx -o ft10.db";

The database will contain a copy of all the data in the model, including the symbols we are interested in.

1. Join proctime, machine and x on (j,t)
2. GAMS only stores non-zero elements, which we need to “repair”

The second issue can be illustrated by looking at the table machine:

We miss the records that have a zero value. We can reintroduce them as follows. First we can generate the Carthesian product of J × T using a simple SQL subquery:

Note that table j has a single column j and table t has a single column t. As we have 10 jobs and 10 stages, this Carthesian product yields a table of 100 rows. Adding the machine number to this table is easy with a left join where we join on (j,t):

The left join added the machine column but the missing values are represented by NA in R (or NULL in SQL). This is easily fixed:

We now add the other columns and also calculate the end column:

##### Approach 2: GAMS + SQL

It is easy in GAMS to do the join. In addition we can use a trick to keep the zeroes:

 alias(*,job,stage,attribute);parameter report(job,stage,attribute);report(j,t,'machine') = EPS + machine(j,t);report(j,t,'start') = EPS + x.l(j,t);report(j,t,'duration') = EPS + proctime(j,t);report(j,t,'end') = EPS + x.l(j,t) + proctime(j,t);execute_unload "ft10";execute "gdx2sqlite -i ft10.gdx -o ft10.db";

The EPS is a placeholder for the zero values The GAMS parameter report looks like:

The GDX2SQLITE tool will map the EPS values into zero before storing them in the database, so on the R side we see:

We need to pivot on the attribute, value columns. Unfortunately this is not completely trivial in SQL. A standard way to do this is:

##### Approach 3: GAMS + R

We keep the GAMS part the same:

 alias(*,job,stage,attribute);parameter report(job,stage,attribute);report(j,t,'machine') = EPS + machine(j,t);report(j,t,'start') = EPS + x.l(j,t);report(j,t,'duration') = EPS + proctime(j,t);report(j,t,'end') = EPS + x.l(j,t) + proctime(j,t);execute_unload "ft10";execute "gdx2sqlite -i ft10.gdx -o ft10.db";

On the R side we can read the parameter report as is and pivot using spread from the tidyr package:

Arguably this is the cleanest solution.

##### References
1. Gantt Chart with R/Plotly, http://yetanothermathprogrammingconsultant.blogspot.com/2017/03/gantt-chart-with-rplotly.html
2. Playing with job shop problem ft10 (1), http://yetanothermathprogrammingconsultant.blogspot.com/2014/04/playing-with-ft10-job-shop-1.html
3. Pivoting a table: GAMS, SQLite, R and Python, http://yetanothermathprogrammingconsultant.blogspot.com/2016/01/pivoting-table-gams-sqlite-r-and-python.html