## Sunday, March 18, 2018

### Piecewise linear regression

In [1] a data set is presented, where a piecewise or segmented linear regression approach can help. A linear fit looks like:

There are two issues we can see from the plot: we can expect a better fit if we can use a piecewise linear function with 2 segments. Secondly the standard regression will allow negative predictions for $$\hat{Y}$$ when we have small $$X$$ values.

The breakpoint is estimated to be $$\bar(x)=1.5$$. The plots seem convincing.

These results were produced using SAS.

Let's see what happens if we use some more advanced mathematical optimization models. The linear model is easily coded as QP problem:

\begin{align} \min \>& \text{SSQ} = \sum_i e_i^2 \\ & y_i = a + b x_i + e_i \end{align} where $$x,y$$ are data.

This purely linear fit is of course identical:

Now let's see if we can tackle a piecewise linear function with two segments. For this we introduce a binary variable $$\delta_i \in \{0,1\}$$ to indicate if an observation belongs to the first or the second segment. A high-level model can look like:

\begin{align} \min & \>\text{SSQ} = \sum_i e_i^2 && (1)\\ & e_i = (1-\delta_i) r_{1,i} + \delta_i r_{2,i} &&(2)\\ &r_{k,i} = y_i - a_k - b_k x_i && (3)\\ & x_i \le \bar{x} \Rightarrow \delta_i = 0 &&(4)\\& x_i \gt \bar{x} \Rightarrow \delta_i = 1 && (5)\\ & \bar{y} = a_k + b_k \bar{x}&& (6)\end{align}

Here $$(\bar{x}, \bar{y})$$ is the break point (to be determined by the model).

If we delete the last constraint, we can formulate the problem as a convex MIQP. This requires linearization of the constraints (2), (4) and (5).

Adding the last constraint $$\bar{y} = a_k + b_k \bar{x}$$ gives us a non-convex MINLP. After solving it we see:

Interestingly these results are very different from [1]. A more formal optimization model can make a lot of difference compared to a more ad-hoc method. The pictures from [1] indicate we are not optimal with respect to the total Sum of Squares. The continuous version of the segmented model with a breakpoint at 1.68 as suggested in [1] has: SSQ=0.00235. This is larger than our SSQ=0.00206459 with a breakpoint at 1.8135.

The two rightmost observations are probably outliers that should be removed from the model.

#### Update: Alternative formulations

If the $$x_i$$ are sorted: $$x_{i+1} \ge x_i$$, we can replace the implications
\begin{align} & x_i \le \bar{x} \Rightarrow \delta_i = 0 &&(4)\\& x_i \gt \bar{x} \Rightarrow \delta_i = 1 && (5) \end{align}  by $\delta_{i+1} \ge \delta_{i}$ I.e. we switch once from 0 to 1, Of course the sorting is no problem: we can sort the data before doing the optimization.

If furthermore we restrict the breakpoint $$\bar{x}$$ to one of the data points, additional reformulations are possible. See the comments below. This can make the model convex. I think for the problem at hand we can add this restriction without much impact on the sum of squares. Also this may be a very good model to solve first: it will be an excellent starting point for the non-convex model. (I used the discontinuous version to generate a good starting point for the non-convex MINLP, but surely this version must be better).

In [2] the problem is stated as the regression: $y = a + b x + c (x-\bar{x}) _{\mathbf{+}}$ where $$x_{\mathbf{+}} = \max(x,0)$$. $$c$$ is now the difference in slope for the second segment. This leads to a simpler optimization problem. The term $$x-\bar{x}$$ can be linearized using a big-M formulation. The model can look like: \begin{align} \min \>& \sum_i e_i^2 \\ & y_i = a + b x_i + c z_i + e_i\\ & z_i = \max(x_i-\bar{x},0) \end{align} As we have a multiplication of two variables $$c z_i$$, we have not made much progress,

#### R package segmented

The R package segmented[2] is actually doing slightly better. Like the previous Baron solution it puts the last two observations in to the second segment, but then moves $$\bar{x}$$ somewhat:

> lm.out <- lm(y~x)
> summary(lm.out)

Call:
lm(formula = y ~ x)

Residuals:
Min        1Q    Median        3Q       Max
-0.018152 -0.002764  0.000568  0.002367  0.035213

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.013224   0.002454   -5.39 8.06e-07 ***
x            0.021558   0.002047   10.53 2.26e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.007519 on 74 degrees of freedom
Multiple R-squared:  0.5999, Adjusted R-squared:  0.5945
F-statistic:   111 on 1 and 74 DF,  p-value: 2.257e-16

> SSQ <- sum(lm.out$residuals^2) > SSQ [1] 0.004183108 > seg.out<-segmented(lm.out,seg.Z = ~x) > summary(seg.out) ***Regression Model with Segmented Relationship(s)*** Call: segmented.lm(obj = lm.out, seg.Z = ~x) Estimated Break-Point(s): Est. St.Err 1.821 0.097 Meaningful coefficients of the linear terms: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.008816 0.001826 -4.828 7.52e-06 *** x 0.016833 0.001564 10.763 < 2e-16 *** U1.x 0.152178 0.063636 2.391 NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.005359 on 72 degrees of freedom Multiple R-Squared: 0.8022, Adjusted R-squared: 0.794 Convergence attained in 4 iterations with relative change 2.835799e-16 > SSQ2 <- sum(seg.out$residuals^2)
> SSQ2
[1] 0.00206802


This is very close to what Baron produced (Baron is just a slightly bit better).

#### References

1. Sandra E. Ryan; Laurie S. Porth, A tutorial on the piecewise regression approach applied to bedload transport data, Gen. Tech. Rep. RMRS-GTR-189. Fort Collins, CO: U.S. Department of Agriculture, Forest Service, https://www.fs.fed.us/rm/pubs/rmrs_gtr189.pdf
2. Muggeo, V.M.R. (2008) Segmented: an R package to fit regression models with broken-line relationships. R News 8/1, 20–25.

## Wednesday, March 14, 2018

### production scheduling: Minimum up-time

In [1] a minimum up-time question is posted:

• Assume we have $$T=12$$ periods. If we produce in period $$t$$ we have a profit of $$c_t$$. Some $$c_t$$'s are negative.
• If we turn production on, it should remain on for at least $$n=5$$ periods.
• We want to maximize total profit.
• How can we write a constraint that captures this minimum up-time condition?

We can recognize production is turned on in period $$t$$ by observing $$x_{t-1}=0$$ and $$x_t=1$$ where $$x_t$$ is a binary variable indicating whether our production process is turned on or off. So our constraint can look like:
$x_{t-1}=0,x_{t}=1 \Longrightarrow \sum_{k=t}^{t+n-1} x_k = n\>\>\>\forall t$
In a MIP model we can translate this into:
$\sum_{k=t}^{t+n-1} x_k \ge n(x_t-x_{t-1}) \>\>\>\forall t$
This formulation is not completely obvious. We note that $$x_t-x_{t-1}$$ is zero or negative for all cases except $$x_{t-1}=0,x_{t}=1$$.

Some notes:
• Behavior near the boundaries is always important. E.g. we could assume $$x_0=0$$.
• Let's also assume $$x_{T+1}=0$$ so we need to finish our production run in the time horizon.
• The MIP constraint can also be written as: $\sum_{k=t+1}^{t+n-1} x_k \ge (n-1)(x_t-x_{t-1}) \>\>\>\forall t$ as we know $$x_t=1$$ when this thing kicks in.
• So this gives us 12 constraints for our $$T=12, n=5$$ problem.
• We can disaggregate the summation into individual constraints $x_k \ge x_t-x_{t-1}\text{ for } k=t+1,\dots,t+n-1$ This is tighter and may solve faster. Advanced solvers may use an "implied bound" cut to do this automatically (and smarter) [2].
A complete model can look like:
\begin{align} \max \>&\sum_t c_t x_t\\ &\sum_{k=t}^{t+n-1} x_k \ge n(x_t-x_{t-1}) \\ & x_t \in \{0,1\}\end{align}
I implemented this in GAMS. GAMS assumes that variables indexed outside their domain are zero. This means I don't have to do anything special for $$t=1$$ and for $$t\gt T-n$$.

A solution with random $$c_t$$ is:

 Optimal Solution
For this small case we could have enumerated all possible feasible production schedules and pick the best one. We have 42 possible solutions:

 All Feasible Solutions ordered by Total Profit

For $$T=24,n=5$$ we have 24 constraints, while enumeration of all feasible solutions would lead to 4316 solutions. NB: I used the Cplex solution pool to enumerate the feasible solutions.

 Optimal solution for T=24

There are a number of extensions one can encounter:

• There may be some initial state. We need more than just the state at $$t=0$$. Basically we need to know for how many periods we are in the 'on'-state.
• We may relax the assumption that $$x_{T+1}=0$$.
• A maximum up-time is also not too difficult using a similar construct. Say we limit up-time to $$m$$ periods. Then we can require:$\sum_{k=t}^{t+m} x_k \le m$ I.e. we don't want to see a series of $$m+1$$ ones. See the picture below for an example with $$T=24, n=3, m=4$$.
• A minimum down-time is sometimes encountered in power generation models. This is close to minimum up-time: make $$y_t=1-x_t$$ and apply our tools we described above on $$y_t$$.
• Limiting the number of start-ups or shut-downs is quite a standard requirement. E.g. to limit the number of times we start a production run: \begin{align} &s_t \ge x_t-x_{t-1}\\& \sum_t s_t \le K \\ & s_t \in \{0,1\} \end{align}
 Run-lengths between 3 and 5.

#### Nitty-Gritty Implementation Details

The basic constraints discussed above are: \begin{align}&\sum_{k=t}^{t+n-1} x_k \ge n(x_t-x_{t-1}) &\forall t &\text{ (minimum length)}\\ & \sum_{k=t}^{t+m} x_k \le m & \forall t & \text{ (maximum length)} \end{align}  In GAMS, the minimum and maximum run-length constraints can be stated straightforwardly as:

alias(t,k);

minlength(t)..

sum(k$(ord(k)>=ord(t) and ord(k)<=ord(t)+n-1), x(k)) =g= n*(x(t)-x(t-1)); maxlength(t).. sum(k$(ord(k)>=ord(t) and ord(k)<=ord(t)+m), x(k)) =l= m;

I often prefer to keep model equations as simple as possible. We can do this by precalculating a set tk(t,k):

set tk(t,k) 'for use in minlength';
tk(t,k) =
ord(k)>=ord(t) and ord(k)<=ord(t)+n-1;
display tk;

minlength(t)..

sum(tk(t,k), x(k)) =g= n*(x(t)-x(t-1));

 Structure of the set tk

This also allows us to debug things before the rest of the model is complete. We can simply use display statements or use the gdx viewer. Equations can only be meaningfully debugged by trying to solve the model they are part of.

A slightly different approach is to rely on leads:\begin{align}&\sum_{k=1}^{n} x_{t+k-1} \ge n(x_t-x_{t-1}) &\forall t &\text{ (minimum length)}\\ & \sum_{k=1}^{m+1} x_{t+k-1} \le m & \forall t & \text{ (maximum length)} \end{align}This is also easily transscribed into GAMS syntax:

minlength(t)..

sum(k$(ord(k)<=n), x[t+(ord(k)-1)]) =g= n*(x(t)-x(t-1)); maxlength(t).. sum(k$(ord(k)<=m+1), x[t+(ord(k)-1)] ) =l= m;

#### More details: disaggregated version

The disaggregated version$x_k \ge x_t-x_{t-1}\text{ for } k=t+1,\dots,t+n-1$ needs a little bit more attention.  If we just translate this to:

minlength2(tk(t,k))..
x(k) =g= x(t)-x(t-1);

then we allow partial runs near the end of the planning horizon.

 Partial runs are not allowed

To prevent this we need to forbid $$x_t \gt x_{t-1}$$ near the end.

minlength2a(tk(t,k))..
x(k) =g= x(t)-x(t-1);

minlength2b(t)$( ord(t)+n-1 > card(t)).. x(t) =l= x(t-1); Interestingly (at least to me), the "leads" formulation is easier. We just can use: minlength3(t,k)$(ord(k)<=n-1)..
x(t+
ord(k)) =g= x(t)-x(t-1);

This will automatically generate $$0 \le x_t-x_{t-1}$$ where needed.

To get a feeling about possible performance improvement when disaggregating the min-length constraint, we need a slightly larger problem. So I used $$T=1024$$ and used $$N=10$$ machines. So our decision variable is now $$x_{i,t}$$.

AggregatedDisaggregated
Size
SINGLE EQUATIONS       10,241
SINGLE VARIABLES       10,241
DISCRETE VARIABLES     10,240

SINGLE EQUATIONS       40,961
SINGLE VARIABLES       10,241
DISCRETE VARIABLES     10,240

Objective 17304.883540 17304.883540
Iterations, Nodes 10547 iterations, 13 nodes 6234 iterations, 0 nodes
Time Cplex Time: 7.39sec (det. 1900.93 ticks) Cplex Time: 5.13sec (det. 1728.88 ticks)

The disaggregated version does better: it can solve the problem during preprocessing.

#### The quest for zero nodes

After doing these experiments with disaggregation I was asking myself:

Is there an aggregated formulation that solves this problem in zero nodes?

It is standard wisdom in integer programming that one should disaggregate variable upper bound constraints on sums of variables. These are constraints of the form:$x_1+\dots+x_n \le (u_1+\dots+u_n) y, \> y \in \{0,1\}$where $$u_j$$ is a valid upper bound on $$x_j \ge 0, (j = 1,\dots,n)$$. This single constraint is equivalent, given the integrality of $$y$$, to the following collection of "disaggregated" constraints: $x_j \le u_j y_j \>(j = 1,\dots,n)$The reason the second, disaggregated formulation is preferred is that, while equivalent given integrality, its linear-programming relaxation is stronger. However, given the ability to automatically disaggregate the first constraint, these "implied bound" constraints can be stored in a pool and added to the LP only as needed. Where $$n$$ is large this latter approach will typically produce a much smaller, but equally effective LP.
Our constraint $\sum_{k=t}^{t+n-1} x_k \ge n(x_t-x_{t-1}) \> \forall t$ is not exactly this form. We can coerce this by:\begin{align} & \sum_{k=t}^{t+n-1} x_k \ge n y_t\\ & y_t \ge x_t-x_{t-1}\\ &y_t \in \{0,1\}\end{align} Here are the results:

AggregatedDisaggregatedAggregated v2
SINGLE EQUATIONS       10,241
SINGLE VARIABLES       10,241
DISCRETE VARIABLES     10,240

SINGLE EQUATIONS       40,961
SINGLE VARIABLES       10,241
DISCRETE VARIABLES     10,240

SINGLE EQUATIONS       20,481
SINGLE VARIABLES       20,481
DISCRETE VARIABLES     20,480

obj=17304.883540 obj=17304.883540 obj=17304.883540
10547 iterations, 13 nodes 6234 iterations, 0 nodes 12477 iterations, 0 nodes
Cplex Time: 7.39sec (det. 1900.93 ticks) Cplex Time: 5.13sec (det. 1728.88 ticks) Cplex Time: 20.61sec (det. 6098.72 ticks)

We achieved our zero nodes, but at a price that is too high. In any case the Cplex performance on this model (with any of these formulations) is fantastic.

#### References

1. Linear programming - optimizing profit with tricky constraints, https://stackoverflow.com/questions/49261669/linear-programming-optimizing-profit-with-tricky-constraints
2. Bixby E.R., Fenelon M., Gu Z., Rothberg E., Wunderling R. (2000) MIP: Theory and Practice — Closing the Gap. In: Powell M.J.D., Scholtes S. (eds) System Modelling and Optimization. CSMO 1999. IFIP — The International Federation for Information Processing, vol 46. Springer, Boston, MA

## Sunday, March 11, 2018

### Regular Constraints

Here is an interesting problem (1) that can be stated in terms of regular expressions.

#### Problem

Puzzle description: The player is given an empty game board (varying size) that they must fill with n-colors, using clue patterns for the rows and columns. Each clue pattern is the sequence of colors in that row/col but with consecutive duplicates removed.

We can also write the hints using regular expressions. E.g. the first row would have r+g+b+.

#### Constraint programming

Constraint solvers like Minizinc/Gecode has a regular constraint [2]. Not for the fainthearted:

 Regular Constraint in Minizinc
This is what a regular expression algorithm will do in the background: execute a state machine.

Let's see how we can model the first row with this. Obviously the pattern rgb allows for the following solutions: "rrgb", "rggb", "rgbb". We encode the colors with integers. The decisions variables $$x$$ are now just an array of integers between $$1 \le x_i \le nc$$ where $$nc$$ is the number of colors. This small problem can be encoded in Minizinc as:

include "globals.mzn";

int: n = 4;    % length of row
int: nc = 3;   % number of colors

array[1..n] of var 1..nc: x;

int: q = 4; % number of states
int: q0 = 1;  % initial state
set of int: q1 = {4}; % final state
array[1..q,1..nc] of int: p =
[|
%  r  g  b
2, 0, 0 |   % 1: accept r
2, 3, 0 |   % 2: accept r or g
0, 3, 4 |   % 3: accept g or b
0, 0, 4 |   % 4: accept b
|];

constraint regular(x, q, nc, p, q0,  q1);

solve satisfy;

output [show(x)];


The state transition table p is often depicted as a graph. In this case it is very simple:

The solution with Minizinc/Gecode looks like:

 Possible solutions for first row

where $$1=r, 2=g, 3=b$$.

This can be repeated for each row and column with non-trivial patterns. There must be a better way to do this, but here is my attempt:

include "globals.mzn";

int: m = 4;    % number of rows
int: n = 4;    % length of columns
int: nc = 3;   % number of colors

% r = 1
% g = 2
% b = 3

array[1..m,1..n] of var 1..nc: x;

int: q = 4; % number of states
int: q0 = 1;  % initial state
set of int: q1 = {4}; % final state
array[1..q,1..nc] of int: p_rgb =
[|
%  r  g  b
2, 0, 0 |   % 1: accept r
2, 3, 0 |   % 2: accept r or g
0, 3, 4 |   % 3: accept g or b
0, 0, 4 |   % 4: accept b
|];

array[1..q,1..nc] of int: p_brg =
[|
%  r  g  b
0, 0, 2 |   % 1: accept b
3, 0, 2 |   % 2: accept b or r
3, 4, 0 |   % 3: accept r or g
0, 4, 0 |   % 4: accept g
|];

array[1..q,1..nc] of int: p_rbg =
[|
%  r  g  b
2, 0, 0 |   % 1: accept r
2, 0, 3 |   % 2: accept r or b
0, 4, 3 |   % 3: accept b or g
0, 4, 0 |   % 4: accept g
|];

array[1..q,1..nc] of int: p_rbr =
[|
%  r  g  b
2, 0, 0 |   % 1: accept r
2, 0, 3 |   % 2: accept r or b
4, 0, 3 |   % 3: accept b or r
4, 0, 0 |   % 4: accept r
|];

array[1..q,1..nc] of int: p_grb =
[|
%  r  g  b
0, 2, 0 |   % 1: accept g
3, 2, 0 |   % 2: accept g or r
3, 0, 4 |   % 3: accept r or b
0, 0, 4 |   % 4: accept b
|];

%---- row 1
constraint regular(row(x, 1), q, nc, p_rgb, q0, q1);

%---- row 2
constraint regular(row(x, 2), q, nc, p_brg, q0, q1);

%---- row 3
constraint row(x, 3) = [3,3,3,3];

%---- row 4
constraint row(x, 4) = [2,1,3,2];

%---- col 1
constraint regular(col(x, 1), q, nc, p_rbg, q0, q1);

%---- col 2
constraint regular(col(x, 2), q, nc, p_rbr, q0, q1);

%---- col 3
constraint regular(col(x, 3), q, nc, p_grb, q0, q1);

%---- col 4
constraint col(x, 4) = [3,2,3,2];

solve satisfy;

output [show(x)];


Notice that all the p matrices are essentially column permuted versions of each other. The results are:

 Solution with Minizinc/Gecode
Too bad Minizinc is not printing the matrix in a nicer way, but with some effort we can see this is correct.

#### MIP Formulation

Inside a MIP we only have linear equations to our disposal. One way to attack the above problem is to generate all possible "strings". I.e. we would have:

• Row 1 is "rrgb", "rggb" or "rgbb"
• Row 2 is "bbrg", "brrg" or "brgg"
• Row 3 is 'bbbb'
• Row 4 is 'grbg'
• Etc.
The model equations can look like:

Here rows(i,k,j,c)=true if the k-th version of row i has color c in column position j. The sets ik(i,k) and jk(j,k) contain the valid elements k for each row or column (rows and columns have different numbers of allowed patterns, e.g. row 3 has only one allowed pattern: 'bbbb').

After adding a cut to forbid previously found solutions and looping until the model becomes infeasible we get the two solutions:

----    104 PARAMETER sols  solutions

INDEX 1 = iter1

R           G           B

i1.j1       1.000
i1.j2       1.000
i1.j3                   1.000
i1.j4                               1.000
i2.j1                               1.000
i2.j2       1.000
i2.j3       1.000
i2.j4                   1.000
i3.j1                               1.000
i3.j2                               1.000
i3.j3                               1.000
i3.j4                               1.000
i4.j1                   1.000
i4.j2       1.000
i4.j3                               1.000
i4.j4                   1.000

INDEX 1 = iter2

R           G           B

i1.j1       1.000
i1.j2       1.000
i1.j3                   1.000
i1.j4                               1.000
i2.j1                               1.000
i2.j2                               1.000
i2.j3       1.000
i2.j4                   1.000
i3.j1                               1.000
i3.j2                               1.000
i3.j3                               1.000
i3.j4                               1.000
i4.j1                   1.000
i4.j2       1.000
i4.j3                               1.000
i4.j4                   1.000


Reference [3] gives a direct approach to represent the regular constraint in a MIP. I think this will be somewhat messy because we have multiple regular expressions in our model.

#### References

1. Constraint programming; Fill grid with colors following pattern rules, https://stackoverflow.com/questions/49101990/constraint-programming-fill-grid-with-colors-following-pattern-rules
2. Minizinc Documentation - Standard Library. Extensional constraints (table, regular etc.), http://www.minizinc.org/doc-lib/doc-globals-extensional.html
3. Côté MC., Gendron B., Rousseau LM. (2007) Modeling the Regular Constraint with Integer Programming. In: Van Hentenryck P., Wolsey L. (eds) Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. CPAIOR 2007. Lecture Notes in Computer Science, vol 4510. Springer, Berlin, Heidelberg

## Wednesday, March 7, 2018

### Square system solver: Barzilai-Borwein spectral method in R

I was playing with some large sparse square systems of non-linear equations (actually resulting from some spreadsheets). R has a package called BB [1] which is based on [2,3]. Technically: this is Barzilai-Borwein spectral method. It is both derivative-free and matrix-free (only needs a few vectors of length $$n$$).

For some problems it works rather nicely:

n <- 100
x0 <- -runif(n)

fbroyden <- function(x) {
n <- length(x)
F <- rep(NA,n)
F[1] <- x[1]*(3-0.5*x[1]) - 2*x[2] + 1
i <- 2:(n-1)
F[i] <- x[i]*(3-0.5*x[i]) - x[i-1] - 2*x[i+1] + 1
F[n] <- x[n]*(3-0.5*x[n]) - x[n-1] + 1
F
}

library(BB)
dfsane(par=x0,fn=fbroyden,control=list(triter=1))


This solves very quickly especially given BB is written in pure R (no C or Fortran):

Iteration:  0  ||F(x0)||:  1.441474
iteration:  1  ||F(xn)|| =   11.23496
iteration:  2  ||F(xn)|| =   7.900779
iteration:  3  ||F(xn)|| =   6.542131
iteration:  4  ||F(xn)|| =   3.139281
iteration:  5  ||F(xn)|| =   4.031394
iteration:  6  ||F(xn)|| =   5.183635
iteration:  7  ||F(xn)|| =   0.5650939
iteration:  8  ||F(xn)|| =   0.1870383
iteration:  9  ||F(xn)|| =   0.09536832
iteration:  10  ||F(xn)|| =   0.04728994
iteration:  11  ||F(xn)|| =   0.0231393
iteration:  12  ||F(xn)|| =   0.01204457
iteration:  13  ||F(xn)|| =   0.01253904
iteration:  14  ||F(xn)|| =   0.004658623
iteration:  15  ||F(xn)|| =   0.00671189
iteration:  16  ||F(xn)|| =   0.002562044
iteration:  17  ||F(xn)|| =   0.0007190788
iteration:  18  ||F(xn)|| =   0.000526917
iteration:  19  ||F(xn)|| =   0.0004205647
iteration:  20  ||F(xn)|| =   0.0003359958
iteration:  21  ||F(xn)|| =   0.0002087812
iteration:  22  ||F(xn)|| =   2.471228e-05
iteration:  23  ||F(xn)|| =   7.605672e-06
iteration:  24  ||F(xn)|| =   1.304184e-05
iteration:  25  ||F(xn)|| =   3.089383e-05
iteration:  26  ||F(xn)|| =   6.874306e-05
iteration:  27  ||F(xn)|| =   3.30784e-05
iteration:  28  ||F(xn)|| =   1.16808e-05
iteration:  29  ||F(xn)|| =   3.541111e-06
iteration:  30  ||F(xn)|| =   8.387173e-07
$par [1] -1.0323920 -1.3150464 -1.3887103 -1.4076713 -1.4125364 -1.4137837 -1.4141034 -1.4141853 -1.4142063 -1.4142117 -1.4142131 [12] -1.4142135 -1.4142135 -1.4142135 -1.4142135 -1.4142136 -1.4142135 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142136 [23] -1.4142136 -1.4142136 -1.4142135 -1.4142136 -1.4142135 -1.4142136 -1.4142135 -1.4142136 -1.4142135 -1.4142136 -1.4142135 [34] -1.4142136 -1.4142136 -1.4142136 -1.4142135 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142135 -1.4142136 [45] -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142135 -1.4142136 -1.4142135 -1.4142136 [56] -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142136 -1.4142135 -1.4142136 -1.4142136 -1.4142136 [67] -1.4142136 -1.4142136 -1.4142135 -1.4142136 -1.4142136 -1.4142135 -1.4142135 -1.4142135 -1.4142135 -1.4142135 -1.4142134 [78] -1.4142132 -1.4142128 -1.4142121 -1.4142107 -1.4142080 -1.4142027 -1.4141924 -1.4141722 -1.4141328 -1.4140561 -1.4139064 [89] -1.4136144 -1.4130448 -1.4119339 -1.4097678 -1.4055460 -1.3973251 -1.3813439 -1.3503811 -1.2907820 -1.1775120 -0.9675106 [100] -0.5965290$residual
[1] 8.387173e-08

$fn.reduction [1] 14.41473$feval
[1] 31

$iter [1] 30$convergence
[1] 0

\$message
[1] "Successful convergence"


There is always to say something. First note how clever the test function was implemented. By setting up the vector i we don't need to do any loops. But also note that the test function is not correctly evaluated with $$n=2$$. In R the sequence "$$2:1$$" results in counting backwards. It generates the numbers $$[2\> 1]$$.

The second thing: the first number $$||F(x^0)||$$ looks a bit funny.  Indeed we have:

> f <- fbroyden(x0)
> norm(f,type="2")
[1] 16.11595
>


Looking at the source I see:

    F0 <- normF <- sqrt(sum(F * F))

if (trace)
cat("Iteration: ", 0, " ||F(x0)||: ", F0/sqrt(n), "\n")


I do not understand the reason for the division by $$\sqrt{n}$$. It is only doing this for the initial value. Subsequent log lines are correctly printing $$||F(x)||$$.

One possible reason is that the termination criterion is:
$||F(x)|| \le \mathit{tol} \cdot \sqrt{n}$
so the author wanted to compare $$\displaystyle\frac{||F(x^0)||}{\sqrt{n}}$$ with $$\mathit{tol}$$.  Obviously this first log line is still mislabeled.

#### References

1. Varadhan, Ravi, and Paul D. Gilbert. 2009. “BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function.” Journal of Statistical Software 32 (4): 1–26. http://www.jstatsoft.org/v32/i04/.
2. Cruz, William La, and Marcos Raydan. 2003. “Nonmonotone Spectral Methods for Large-Scale Nonlinear Systems.” Optimization Methods and Software 18: 583–99.
3. Cruz, William La, Jose Mario Martınez, and Marcos Raydan. 2006. “Spectral Residual Method Without Gradient Information for Solving Large-Scale Nonlinear Systems of Equations.” Mathematics of Computation 75: 1449–66.

### Brain Storm Optimization

I have seen the name of this heuristic before. There is also a Hypo-Variance Brain Storm Optimization. Reading the abstract in [1] made me laugh again.

Of course a human is smarter than an ant or a bee, so this algorithm must work much better than Ant Colony or some Bee Based heuristic! Or as stated in the abstract:

Abstract. Human being is the most intelligent animal in this world. Intuitively, optimization algorithm inspired by human being creative problem solving process should be superior to the optimization algorithms inspired by collective behavior of insects like ants, bee, etc. In this paper, we introduce a novel brain storm optimization algorithm, which was inspired by the human brainstorming process. Two benchmark functions were tested to validate the effectiveness and usefulness of the proposed algorithm.
Of course two test functions is more than enough.

In [2] a list of these type of heuristics is presented:

The "intelligent water drop" may be another top contender.

#### References

1. Y. Shi, “Brain storm optimization algorithm,” in Advances in Swarm Intelligence, ser. Lecture Notes in Computer Science, Y. Tan, Y. Shi, Y. Chai, and G. Wang, Eds. Springer Berlin/Heidelberg, 2011, vol. 6728, pp. 303–309. https://link.springer.com/chapter/10.1007/978-3-642-21515-5_36
2. Iztok Fister Jr., Xin-She Yang, Iztok Fister, Janez Brest, Dusan Fister, A Brief Review of Nature-Inspired Algorithms for Optimization, http://arxiv.org/abs/1307.4186

## Sunday, March 4, 2018

### Matlab: Problem Based Optimization

MATLAB has a new way to express optimization problems. Until now they used a matrix-based API (this is now called: "Solver Based Optimization"). In 2017b they added "Problem Based Optimization" which is kind of a algebraic modeling language (like AMPL or GAMS) inside Matlab. Let's revisit a simple problem related to warehousing [3].

#### Mathematical Model

We use the following indices:
• $$p$$: products
• $$f$$: factories
• $$w$$: warehouses
• $$s$$: stores
We introduce the following variables:
• $$x_{p,f,w} \ge 0$$: shipments of product $$p$$ from factory $$f$$ to warehouse $$w$$,
• $$y_{s,w} \in \{0,1\}$$: links each store $$s$$ to a single warehouse $$w$$.
The data associated with the model is:
• $$pcost_{f,p}$$: unit production cost
• $$tcost_{p}$$: unit transportation cost
• $$dist_{f,w}, dist_{s,w}$$: distances
• $$pcap_{f,p}$$: factory production capacities
• $$wcap_{w}$$: warehouse capacity
• $$d_{s,p}$$: demand for product $$p$$ at store $$s$$
• $$turn_{p}$$: product turnover rate
The optimization model looks like:

MIP Model
\large{\begin{align} \min&\sum_{p,f,w} (pcost_{f,p}+tcost_p\cdot dist_{f,w})\cdot x_{p,f,w}+\sum_{s,w,p} d_{s,p}\cdot tcost_p \cdot dist_{s,w} \cdot y_{s,w}\\ &\sum_w x_{p,f,w} \le pcap_{f,p} \>\>\forall f,p \>\> \text{(production capacity)}\\ &\sum_f x_{p,f,w} = \sum_s d_{s,p}\cdot y_{s,w} \>\>\forall p,w \>\>\text{(demand)}\\ &\sum_{p,s} \frac{d_{s,p}}{turn_{p}} y_{s,w} \le wcap_{w}\>\>\forall w \>\>\text{(warehouse capacity)}\\ &\sum_w y_{s,w} = 1\>\>\forall s \>\>\text{(one warehouse for a store)}\\ &x_{p,f,w} \ge 0 \\ &y_{s,w} \in \{0,1\} \end{align}}

#### Matlab: Solver Based Implementation

This implementation uses the old matrix based API. The code is somewhat removed from the original problem as we need to use quite some non-trivial code to deal with the details in populating the solver matrices. This is still a very simple model: for more complex models this amount of code and its complexity will only increase. See [1] for more details.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 %----------------------------------------------------- % Generate a random problem: Facility Locations %----------------------------------------------------- rng(1) % for reproducibility N = 20; % N from 10 to 30 seems to work. Choose large values with caution. N2 = N*N; f = 0.05; % density of factories w = 0.05; % density of warehouses s = 0.1; % density of sales outlets F = floor(f*N2); % number of factories W = floor(w*N2); % number of warehouses S = floor(s*N2); % number of sales outlets xyloc = randperm(N2,F+W+S); % unique locations of facilities [xloc,yloc] = ind2sub([N N],xyloc); %----------------------------------------------------- % Generate Random Capacities, Costs, and Demands %----------------------------------------------------- P = 20; % 20 products % Production costs between 20 and 100 pcost = 80*rand(F,P) + 20; % Production capacity between 500 and 1500 for each product/factory pcap = 1000*rand(F,P) + 500; % Warehouse capacity between P*400 and P*800 for each product/warehouse wcap = P*400*rand(W,1) + P*400; % Product turnover rate between 1 and 3 for each product turn = 2*rand(1,P) + 1; % Product transport cost per distance between 5 and 10 for each product tcost = 5*rand(1,P) + 5; % Product demand by sales outlet between 200 and 500 for each % product/outlet d = 300*rand(S,P) + 200; %--------------------------------------------------------- % Generate Objective and Constraint Matrices and Vectors %--------------------------------------------------------- obj1 = zeros(P,F,W); % Allocate arrays obj2 = zeros(S,W); distfw = zeros(F,W); % Allocate matrix for factory-warehouse distances for ii = 1:F for jj = 1:W distfw(ii,jj) = abs(xloc(ii) - xloc(F + jj)) + abs(yloc(ii) ... - yloc(F + jj)); end end distsw = zeros(S,W); % Allocate matrix for sales outlet-warehouse distances for ii = 1:S for jj = 1:W distsw(ii,jj) = abs(xloc(F + W + ii) - xloc(F + jj)) ... + abs(yloc(F + W + ii) - yloc(F + jj)); end end for ii = 1:P for jj = 1:F for kk = 1:W obj1(ii,jj,kk) = pcost(jj,ii) + tcost(ii)*distfw(jj,kk); end end end for ii = 1:S for jj = 1:W obj2(ii,jj) = distsw(ii,jj)*sum(d(ii,:).*tcost); end end obj = [obj1(:);obj2(:)]; % obj is the objective function vector matwid = length(obj); Aineq = spalloc(P*F + W,matwid,P*F*W + S*W); % Allocate sparse Aeq bineq = zeros(P*F + W,1); % Allocate bineq as full % Zero matrices of convenient sizes: clearer1 = zeros(size(obj1)); clearer12 = clearer1(:); clearer2 = zeros(size(obj2)); clearer22 = clearer2(:); % First the production capacity constraints counter = 1; for ii = 1:F for jj = 1:P xtemp = clearer1; xtemp(jj,ii,:) = 1; % Sum over warehouses for each product and factory xtemp = sparse([xtemp(:);clearer22]); % Convert to sparse Aineq(counter,:) = xtemp'; % Fill in the row bineq(counter) = pcap(ii,jj); counter = counter + 1; end end % Now the warehouse capacity constraints vj = zeros(S,1); % The multipliers for jj = 1:S vj(jj) = sum(d(jj,:)./turn); % A sum of P elements end for ii = 1:W xtemp = clearer2; xtemp(:,ii) = vj; xtemp = sparse([clearer12;xtemp(:)]); % Convert to sparse Aineq(counter,:) = xtemp'; % Fill in the row bineq(counter) = wcap(ii); counter = counter + 1; end Aeq = spalloc(P*W + S,matwid,P*W*(F+S) + S*W); % Allocate as sparse beq = zeros(P*W + S,1); % Allocate vectors as full counter = 1; % Demand is satisfied: for ii = 1:P for jj = 1:W xtemp = clearer1; xtemp(ii,:,jj) = 1; xtemp2 = clearer2; xtemp2(:,jj) = -d(:,ii); xtemp = sparse([xtemp(:);xtemp2(:)]'); % Change to sparse row Aeq(counter,:) = xtemp; % Fill in row counter = counter + 1; end end % Only one warehouse for each sales outlet: for ii = 1:S xtemp = clearer2; xtemp(ii,:) = 1; xtemp = sparse([clearer12;xtemp(:)]'); % Change to sparse row Aeq(counter,:) = xtemp; % Fill in row beq(counter) = 1; counter = counter + 1; end %----------------------------------------------------- % Bound Constraints and Integer Variables %----------------------------------------------------- intcon = P*F*W+1:length(obj); lb = zeros(length(obj),1); ub = Inf(length(obj),1); ub(P*F*W+1:end) = 1; opts = optimoptions('intlinprog','Display','off','PlotFcn',@optimplotmilp); %----------------------------------------------------- % Solve the Problem %----------------------------------------------------- [solution,fval,exitflag,output] = intlinprog(obj,intcon,... Aineq,bineq,Aeq,beq,lb,ub,opts); 

#### Matlab: Problem Based Implementation

This implementation is using the new Matlab Problem Based Optimization language features [2]. In my opinion this is a big improvement. When reading this code we at least recognize the original problem. In addition we see the code is more compact.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 %----------------------------------------------------- % Generate a random problem: Facility Locations %----------------------------------------------------- rng(1) % for reproducibility N = 20; % N from 10 to 30 seems to work. Choose large values with caution. N2 = N*N; f = 0.05; % density of factories w = 0.05; % density of warehouses s = 0.1; % density of sales outlets F = floor(f*N2); % number of factories W = floor(w*N2); % number of warehouses S = floor(s*N2); % number of sales outlets xyloc = randperm(N2,F+W+S); % unique locations of facilities [xloc,yloc] = ind2sub([N N],xyloc); %----------------------------------------------------- % Generate Random Capacities, Costs, and Demands %----------------------------------------------------- P = 20; % 20 products % Production costs between 20 and 100 pcost = 80*rand(F,P) + 20; % Production capacity between 500 and 1500 for each product/factory pcap = 1000*rand(F,P) + 500; % Warehouse capacity between P*400 and P*800 for each product/warehouse wcap = P*400*rand(W,1) + P*400; % Product turnover rate between 1 and 3 for each product turn = 2*rand(1,P) + 1; % Product transport cost per distance between 5 and 10 for each product tcost = 5*rand(1,P) + 5; % Product demand by sales outlet between 200 and 500 for each % product/outlet d = 300*rand(S,P) + 200; %----------------------------------------------------- % Generate Variables and Constraints %----------------------------------------------------- distfw = zeros(F,W); % Allocate matrix for factory-warehouse distances for ii = 1:F for jj = 1:W distfw(ii,jj) = abs(xloc(ii) - xloc(F + jj)) + abs(yloc(ii) ... - yloc(F + jj)); end end distsw = zeros(S,W); % Allocate matrix for sales outlet-warehouse distances for ii = 1:S for jj = 1:W distsw(ii,jj) = abs(xloc(F + W + ii) - xloc(F + jj)) ... + abs(yloc(F + W + ii) - yloc(F + jj)); end end x = optimvar('x',P,F,W,'LowerBound',0); y = optimvar('y',S,W,'Type','integer','LowerBound',0,'UpperBound',1); capconstr = sum(x,3) <= pcap'; demconstr = squeeze(sum(x,2)) == d'*y; warecap = sum(diag(1./turn)*(d'*y),1) <= wcap'; salesware = sum(y,2) == ones(S,1); %----------------------------------------------------- % Create Problem and Objective %----------------------------------------------------- factoryprob = optimproblem; objfun1 = sum(sum(sum(x,3).*(pcost'),2),1); objfun2 = 0; for p = 1:P objfun2 = objfun2 + tcost(p)*sum(sum(squeeze(x(p,:,:)).*distfw)); end r = sum(distsw.*y,2); % r is a length s vector v = d*(tcost(:)); objfun3 = sum(v.*r); factoryprob.Objective = objfun1 + objfun2 + objfun3; factoryprob.Constraints.capconstr = capconstr; factoryprob.Constraints.demconstr = demconstr; factoryprob.Constraints.warecap = warecap; factoryprob.Constraints.salesware = salesware; %----------------------------------------------------- % Solve the Problem %----------------------------------------------------- opts = optimoptions('intlinprog','Display','off','PlotFcn',@optimplotmilp); [sol,fval,exitflag,output] = solve(factoryprob,opts); 

#### GAMS Implementation

For completeness, here is the GAMS representation [3]:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 sets k 'locations' /factory1*factory20, warehouse1*warehouse20, store1*store40/ f(k) 'factories' /factory1*factory20/ w(k) 'warehouses' /warehouse1*warehouse20/ s(k) 'stores' /store1*store40/ p 'products' /product1*product20/ ; parameters xloc(k) 'locations: x-coordinates' yloc(k) 'locations: y-coordinates' pcost(f,p) 'unit production cost' pcap(f,p) 'production capacity' wcap(w) 'warehouse capacity' turn(p) 'product turnover rate' tcost(p) 'unit transportation cost' d(s,p) 'demand' ; xloc(k) = uniformint(1,20); yloc(k) = uniformint(1,20); pcost(f,p) = uniform(20,100); pcap(f,p) = uniform(500,1500); wcap(w) = uniform(400*card(p),800*card(p)); turn(p) = uniform(1,3); tcost(p) = uniform(5,10); d(s,p) = uniform(200,500); positive variable x(p,f,w) 'flow factory to warehouse'; binary variable y(s,w) 'assign warehouse to store'; variable z 'objective variable'; equations obj 'objective' prodcap(f,p) 'production capacity' whcap(w) 'warehouse capacity' demand(p,w) assign(s) 'assign wh to store' ; obj.. z =e= sum((p,f,w), (pcost(f,p)+tcost(p)*distfw(f,w))*x(p,f,w)) + sum((s,w,p), d(s,p)*tcost(p)*distsw(s,w)*y(s,w)); prodcap(f,p).. sum(w,x(p,f,w)) =l= pcap(f,p); whcap(w).. sum((p,s),(d(s,p)/turn(p))*y(s,w)) =l= wcap(w); demand(p,w).. sum(f,x(p,f,w)) =e= sum(s, d(s,p)*y(s,w)); assign(s).. sum(w,y(s,w)) =e= 1; model m /all/; option optcr=0; solve m minimizing z using mip; 

It is instructive to see the differences in approaches. Here are some highlights:

Expression Matlab (Model Based) GAMS
$\sum_{p,f,w} pcost_{f,p} \cdot x_{p,f,w}$
sum(sum(sum(x,3).*(pcost'),2),1)
sum((p,f,w), pcost(f,p)*x(p,f,w))
$\sum_{p,f,w} tcost_p \cdot distfw_{f,w} \cdot x_{p,f,w}$
objfun2 = 0;
for p = 1:P
objfun2 = objfun2 + tcost(p)*sum(sum(squeeze(x(p,:,:)).*distfw));
end
sum((p,f,w), tcost(p)*distfw(f,w)*x(p,f,w))
$\sum_w x_{p,f,w} \le pcap_{f,p}$
sum(x,3) <= pcap'
sum(w,x(p,f,w)) =l= pcap(f,p)
$\sum_f x_{p,f,w} = \sum_s d_{s,p}\cdot y_{s,w}$
squeeze(sum(x,2)) == d'*y
sum(f,x(p,f,w)) =e= sum(s, d(s,p)*y(s,w))
$\sum_{p,s} \frac{d_{s,p}}{turn_{p}} y_{s,w} \le wcap_{w}$
sum(diag(1./turn)*(d'*y),1) <= wcap'
sum((p,s),(d(s,p)/turn(p))*y(s,w)) =l= wcap(w)
$\sum_w y_{s,w} = 1$
sum(y,2) == ones(S,1)
sum(w,y(s,w)) =e= 1

There is a notable difference in number of different language constructs we use. In Matlab we use:

1. Different versions of sum
2. A loop
3. Diag, transpose ('), element wise division (./)
4. Squeeze
In GAMS we basically only use sum.

#### References

1. Factory, Warehouse, Sales Allocation Model: Solver-Based, http://www.mathworks.com/help/optim/ug/factory-warehouse-sales-allocation-model.html
2. Factory, Warehouse, Sales Allocation Model: Model-Based, http://www.mathworks.com/help/optim/examples/factory-warehouse-sales-allocation-model.html
3. Matlab vs GAMS: Integer Programming, http://yetanothermathprogrammingconsultant.blogspot.com/2016/10/matlab-vs-gams-integer-programming.html