Thursday, September 29, 2016

Matlab vs GAMS: Linear Programming

Regularly I am asked “What is the difference between Matlab and GAMS when modeling optimization problems?”. I’ll try to answer this question with some examples.

Matlab

In Matlab LP and MIP models are modeled using a matrix data structure. Basically a linear optimization model is viewed as:

\[\boxed{\begin{align}\min\>&c^Tx\\&A_1 x\le b_1\\&A_2 x=b_2\\&\ell\le x \le u\end{align}}\]

We need to populate the vectors \(c\), \(b_1\), \(b_2\), \(\ell\) and \(u\) and the matrices \(A_1\), \(A_2\) and then call linprog or intlinprog.

Some notes:

  • The matrices \(A_1\), \(A_2\) can become very, very large and sparse for larger models. In that case it may help to use a sparse matrix storage scheme, preventing all the zeros being stored. We’ll see an example of this later on in a subsequent post.
  • There are alternative modeling frameworks available for use with Matlab, including Tomlab and CVX. These systems take a step toward a somewhat more equation based modeling paradigm. In some sense one could think about them as positioned between pure Matlab and a modeling system like AMPL and GAMS. 
  • Instead of the Matlab optimization toolbox solvers linprog or intlinprog, you can also use high-performance solvers such as Cplex and Gurobi. They have a similar matrix oriented modeling interface as the Matlab solvers.  
  • Matlab’s quadratic solver quadprog also has a similar matrix oriented interface.
  • General NLP (nonlinear programming) problems have a totally different interface in Matlab. They require the objective and constraints to be modeled as functions. Introducing nonlinearities to an LP model will force you to rewrite the the model code.
  • In practice Matlab’s matrix approach works best for highly structured models, and is not as convenient for more complex and messy models.
  • Many LP/MIP models in R are using an matrix-based modeling paradigm. Much of what I am saying here also applies to R.
  • Indexing in Matlab is done by integers, and array indexing (vectors, matrices) is one-based.
  • Matlab default variable bounds are \(-\infty \le x \le \infty\). Most LP and MIP solvers use a different default: if a variable has no bounds it is considered positive.

GAMS

GAMS is an equation based modeling system. This means an optimization model is collection of algebraic equations. For instance a capacity constraint can look like:

Mathematics:
\[\sum_j x_{i,j} \le cap_{i} \>\> \forall i\]
GAMS:

Capacity(i)..
   sum(j, x(i,j)) =l= cap(i);

Of course we can write a whole LP as just one or two equations, but that is almost never a good modeling approach:

objective..
   z =e= sum(j, c(j)*x(j));
e(i)..
   sum(j, a(i,j)*x(j)) =e= b(i);

Notes:

  • GAMS is similar to other modeling systems such as AMPL. These notes largely apply to those modeling systems as well.
  • Equation based modeling allows an easy transition to non-linear programming. We don’t have to change the modeling paradigm when the model becomes non-linear.
  • A properly designed model will is both compact and allows us to think and reason about the model. Often we don’t need to maintain a separate document that states the mathematical model.
  • A modeling system helps to be efficient in writing and maintaining complex models.
  • GAMS is somewhat unusual in not having an objective function but rather an objective variable. The objective function is formulated as a normal constraint.
  • By default GAMS variables are free variables. As said before most other modeling software assume by default that variables are non-negative.
  • Indexing in GAMS is done using strings.

A simple LP transportation problem

Let’s start with an extremely simple transportation problem from the GAMS model library (1) and originally from (2) (though with slightly different data, making the model degenerate).

image

The mathematical model looks like:

\[\boxed{\begin{align}\min\>&\sum_{i,j} c_{i,j} x_{i,j}\\&\sum_j x_{i,j} \le a_i\>\>\forall i\\&\sum_i x_{i,j} \ge b_j\>\>\forall j\\&x_{i,j}\ge 0\end{align}}\]

Below is a side-by-side comparison of the Matlab code and the GAMS model. As you will see the Matlab matrix approach is very different from an equation based language.

Matlab code GAMS code

NumPlants = 2;

NumMarkets = 3;

MatLab uses indexing by integers.
GAMS is set based with set elements
represented by strings
sets
  i
canning plants /seattle, san-diego /
  j
markets        /new-york, chicago, topeka /
;

a = [350 600]';     % capacity

b = [325 300 275]'; % demand

d = [2.5 1.7 1.8;   % distance

     2.5 1.8 1.4];

f = 90;             % unit freight cost

c = f * d / 1000;   % freight

Ordering is important. We need to remember index=1 means Seattle or New-York depending on the situation.
Ordering is not important. parameters
   a(i) 
capacity of plant i in cases
      
/  seattle     350
         
san-diego   600  /

   b(j) 
demand at market j in cases
      
/  new-york    325
         
chicago     300
         
topeka      275  / ;

table d(i,j)  distance in thousands of miles

             
new-york   chicago   topeka
 
seattle        2.5       1.7       1.8
 
san-diego      2.5       1.8       1.4  ;

scalar f  freight in dollars per case per 1000 miles /90/
;

parameter c(i,j)  transport cost in $1000 per case
;
c(i,j) = f * d(i,j) / 1000 ;

% allocate matrices

Asupply = zeros(NumPlants,NumPlants*NumMarkets);

Ademand = zeros(NumMarkets,NumPlants*NumMarkets);

 

% populate submatrices

for i=1:NumPlants

    for j=1:NumMarkets
        k = (i-1)*NumMarkets + j;

        Asupply(i, k) = 1;

        Ademand(j, k) = 1;

    end

end

 

% populate matrix A and vectors B,C

A = [Asupply;-Ademand];

B = [a;-b];

C = reshape(c',NumPlants*NumMarkets,1);

 

% positive variables

lb = zeros(NumPlants*NumMarkets,1);

Lots of index calculations.

Note that we flip the ≥ demand constraint to a ≤ constraint.

The matrix A looks like:

A =
   1   1   1   0   0   0
   0   0   0   1   1   1
  -1   0   0  -1   0   0
   0  -1   0   0  -1   0
   0   0  -1   0   0  -1

The GAMS code for the equations is much closer to the mathematical model. variables
  x(i,j) 
shipment quantities in cases
  z      
total transportation costs in $1000 ;

positive variable
x ;

equations

   cost       
define objective function
   supply(i)  
observe supply limit at plant i
   demand(j)  
satisfy demand at market j ;

cost..        z =e=
sum
((i,j), c(i,j)*x(i,j)) ;

supply(i)..  
sum
(j, x(i,j)) =l= a(i) ;

demand(j)..  
sum
(i, x(i,j)) =g= b(j) ;

options = optimoptions('linprog','Algorithm','dual-simplex');

[x,z] = linprog(C,A,B,[],[],lb,[],[],options);

x=reshape(x,NumMarkets,NumPlants)';

z,x

 
 

model transport /all/ ;
solve
transport using lp minimizing z ;
display
z.l,x.l;

The output of the two models is similar. As the LP model has multiple optimal solutions, we actually find two different solutions.

Matlab output GAMS output

z =

  153.6750

x =

     0   300     0
   325     0   275

 

----     66 VARIABLE z.L        =      153.675  total transportation costs in $1000

----     66 VARIABLE x.L  shipment quantities in cases

             new-york     chicago      topeka

seattle        50.000     300.000
san-diego     275.000                 275.000

This was a very simple model model, about the simplest one can imagine. But even with this simple model we can illustrate the differences between a matrix oriented model system like Matlab and an equation based system like GAMS. Already in the Matlab code, I had to apply some careful matrix indexing tricks. The main reason is we need to map from variables \(x_{i,j}\) to matrix columns (each variable is a column in the coefficient matrices \(A_1\) and \(A_2\)), i.e. we have to map from a matrix to a vector. If we have multiple variables and/or variables that have an irregular shape, things become even more dicey in the Matlab approach.

A comparison of a slightly more complicated MIP model is shown in (4). 

 

References

  1. http://www.gams.com/modlib/libhtml/trnsport.htm
  2. G.B.Dantzig, Linear Programming and Extensions, Princeton University Press, Princeton, New Jersey, 1963.
  3. Robert Fourer, “Modeling languages versus matrix generators for linear programming”, ACM Transactions on Mathematical Software, Volume 9, Issue 2, June 1983, Pages 143-183.
  4. http://yetanothermathprogrammingconsultant.blogspot.com/2016/10/matlab-vs-gams-integer-programming.html shows a Matlab example of a slightly more complex MIP where Matlabs matrix approach becomes somewhat nightmarish.

Tuesday, September 27, 2016

Automated Tuning of the JVM with Bayesian Optimization

Cplex and Gurobi have tuning options to find some good options settings for their LP/MIP solvers so your models solver faster. Basically we need some help as the number of options we can tune is just too large to select good options by hand. These automatic tuning approaches are based on some heuristics. Here is an application of Bayesian Optimization to do the same for the Java Virtual Machine in a complex production environment at Twitter. The JVM has a lot of tuning parameters and a more formal optimization approach is used here to find a good set of settings.

The video below is a presentation at JavaOne 2016.



References

  1. Jonas Mockus, Bayesian Approach to Global Optimization: Theory and Applications, Springer, 1989.
  2. Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams and Nando de Freitas, "Taking the Human Out of the Loop: A Review of Bayesian Optimization", Proceedings of the IEEE, Volume 104, Issue 1, Jan 2016. 

Saturday, September 24, 2016

OMPR: LP/MIP models in R

From https://github.com/dirkschumacher/ompr:

OMPR (Optimization Modelling Package in R) is a DSL to model and solve Mixed Integer Linear Programs. It is inspired by the excellent Jump project in Julia.

Here are some problems you could solve with this package:

  • What is the cost minimal way to visit a set of clients and return home afterwards?
  • What is the optimal conference time table subject to certain constraints (e.g. availability of a projector)?
  • If you run a radio station :) What is the optimal way to play music such that your users do not have to listen to the same songs too often?

The Wikipedia article gives a good starting point if you would like to learn more about the topic.

This is a beta version. Currently working towards a first stable version for CRAN. At the moment not recommended for production systems / important analyses. Although most obvious bugs should be gone. Happy to get bug reports or feedback.

Current version: 0.3.0

Looks interesting.

References
  1. https://dirkschumacher.github.io/ompr/articles/problem-graph-coloring.html
  2. http://orinanobworld.blogspot.com/2016/11/mip-models-in-r-with-ompr.html

Wednesday, September 21, 2016

LP Model with 150 billion equations?

https://groups.google.com/forum/#!topic/gurobi/zdO8cqJIIv4.

My usual rule of thumb is that up to 10 million equations is doable on normal PC hardware. This is 15000 times over that amount. I am using here the US notion of a billion (i.e. \(10^9\) instead of \(10^{12}\)).

In some cases I feel models are overly detailed. Often the answer to results that don’t match expectations is: “we need to add more detail”. I am convinced this is not always the correct conclusion. Of course building and solving large models is more sexy than small models. But much insight can be learned from really small models.

A problem with \(1.5 \times 10^{11}\) equations probably needs a ton a of memory, say in the order of hundreds of terabytes. Is there a machine with with several hundred terabyte of memory? The Titan supercomputer has about 500 terabyte memory (the link expresses memory in Tebibytes – I had to look that up).

Tuesday, September 20, 2016

Gender-based Seating Arrangement

From this post:

I'm trying to solve a special assignment problem, but I cannot handle it.

The problem is to assign seats to audiences. Each audience has a ranking of the seats based on his/her own preference. I want to assign the seats to the audiences such that the overall satisfaction is maximized.

The above problem is nothing but an assignment problem, which can be solved by the Hungarian algorithm. Now, things becomes a little more complicated. The audiences are divided into male audiences and female audiences. When assigning the seats, there should be at least n empty seats between each male and female. I assume that there are enough seats to accommodate all audiences and that all seats are located in a row.

This looks like an assignment problem with some side-constraints. The assignment part is boilerplate:

\[\boxed{\begin{align}\max\>&\sum_{i,j}s_{i,j}x_{i,j}\\&\sum_j x_{i,j}=1 \>\forall i\\&\sum_i x_{i,j}\le 1\>\forall j\\&x_{i,j}\in \{0,1\}\end{align}}\]

Here \(i\) indicates spectators and \(j\) refers to seats. Modeling the constraint that females cannot sit next to males is more challenging and interesting (from a strict modeling perspective that is: these seating arrangements do not look very attractive to me).

To experiment with some formulations I generate some random data for the preferences \(s_{i,j}\):

image 

If we want to have \(n\) empty seats in between the males and females, then for \(m=m_F+m_M\) persons we need at least \(m+n\) seats.

Quadratic Model

Probably the easiest is to use a quadratic model. First we introduce a distance matrix \(d_{j,j’}\) indicating 1 + the number of seats between seats \(j\) and \(j’\). I.e. this matrix looks like:

image

Then we need to construct a set \(check_{i,i’}\) which has all unique female-male combinations we need to check. This set arranged as a matrix looks like:

image

With this data we can now write our side-constraint as:

\[\sum_{j\ne j’} d_{j,j’} x_{i,j} x_{i’,j’} \ge n+1 \>\> \forall check(i,i’) \]

i.e.: for every combination \((i,i’)\) where \(i\) is a female and \(i’\) is a male, placed in seats \((j,j’)\), make sure the distance \(d_{j,j’}\) of their seats is at least \(n+1\).

This formulation is quadratic but unfortunately not convex (the Q matrix of quadratic coefficients is not negative semidefinite). We are lucky: some solvers like Cplex and Gurobi accept it nevertheless: they apply some reformulations behind the scenes. Gurobi really, really wants to let you know and prints:

Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals

The optimal assignment for \(n=1\) looks like:

image

Linearization

The multiplication of two binary variables can be easily linearized. A standard way to linearize \(z=x\cdot y\) with \(x,y \in \{0,1\}\) is:

\[\boxed{\begin{align}&z \le x\\ &z \le y \\ & z\ge x+y-1\\& x,y,z \in \{0,1\}\end{align}}\]

In our case we can simplify a bit:

\[\boxed{\begin{align}&\sum_{j,j’} d_{j,j’} y_{i,j,i’,j’} \ge n+1\\ & y_{i,j,i’,j’} \le x_{i,j}\\ &  y_{i,j,i’,j’} \le x_{i’,j’}\\ & y_{i,j,i’,j’} \in \{0,1\} \end{align}} \]

This formulation leads to a very large model:

image

Although this linearized formulation finds the optimal solution, clearly, this is not the way to go.

Forbid patterns

A different formulation can be based on forbidding certain patterns in the seating. For \(n=1\) we do not allow F,M or M,F to appear.

The first thing we do is to introduce a variable that tells us if set \(j\) is occupied by a male or female (or empty). We can do this by using two binary variables \(m_j\) and \(f_j\), or we can use a single variable \(g_{j,fm} \in \{0,1\} \) where \(fm = \{f,m\}\) is a set indicating the gender. 

To calculate \(g_{j,fm}\) we can use a simple summation over the females or males only, depending on what \(fm\) is:

\[g_{j,fm}=\sum_{i|gender_{i,fm}} x_{i,j} \]

Variable \(g_{i,fm}\) will look like:

image

This is kind of an aggregation of the variable \(x_{i,j}\). Based on \(g_{j,fm}\) we can formulate a set of equations, again assuming \(n=1\):

\[\boxed{\begin{align}&g_{seat1,female}+g_{seat2,male}\le1\\&g_{seat1,male}+g_{seat2,female}\le1\\&g_{seat2,female}+g_{seat3,male}\le1\\&g_{seat2,male}+g_{seat3,female}\le1\\&g_{seat3,female}+g_{seat4,male}\le1\\&g_{seat3,male}+g_{seat4,female}\le1\\&\dots\end{align}}\]

Of course I did not write these individual equations. Rather I created and populated the 4 dimensional set \(pattern_{k,j,j’,fm}\) and introduced just one equation block:

\[\sum_{j’,fm|pattern_{k,j,j’,fm}} g_{j’,fm} \le 1 \>\>\forall j,k\]

This linear formulation has 83 variables and 180 equations and solves very fast. For \(n>1\) we can formulate generalizations of this scheme.

Different problem: no males next to each other

A slightly different problem is actually easier to formulate: make sure no males are sitting next to each other (because they start to fight?). Also let’s apply the same rule for the women. Again we can use the scheme where we forbid patterns. Here we forbid F,F and M,M.  This we can do by only allowing up to one F or M in two consecutive seats.

First we create a simple set that gives us the groups of two seats next to each other. Call this set \(group_{j,j’}\). It looks like:

image

We can formulate the spacing equation simply as:

\[\sum_{i|gender_{i,fm}} \sum_{j’|group_{j,j’}} x_{i,j’} \le 1\>\> \forall j,fm\]

Actually we can improve on this a little bit. Using extra variables \(g_{j,fm}\) indicating the gender of the person in seat \(j\) as defined in the previous paragraph we can write:

\[\begin{align}&g_{j,fm}=\sum_{i|gender_{i,fm}} x_{i,j}\>\> \forall j,fm\\&\sum_{j’|group_{j,j’}} g_{j’,fm} \le 1\>\>\forall j,fm \end{align}\]

Although we add extra variables and equations, this makes the model more readable, and actually sparser (i.e. fewer non-zero elements). Often larger and sparser is better than smaller and denser.

The results looks like:

image

GAMS and Trump

A new study based on a course-grained CGE model on top of a fine-grained input-output model (written in GAMS) indicates that the trade policies proposed by presidential candidate Donald Trump could have substantial negative effects.

The principle of comparative advantages in trade was called by Paul Samuelson: an economic idea that is both universally true and not obvious. The not so intuitive part of this seems prevalent in the political discussions and on the campaign trail: most arguments are not  based on economics.

References
  1. Marcus Noland, Gary Clyde Hufbauer, Sherman Robinson, and Tyler Moran, “Assessing Trade Agendas in the US Presidential Campaign”, https://piie.com/system/files/documents/piieb16-6.pdf
  2. Donald J. Boudreaux, “Comparative Advantage”, The Concise Encyclopedia of Economics, http://www.econlib.org/library/Enc/ComparativeAdvantage.html
  3. Erwin Kalvelagen, “Solving Systems of Linear Equations with GAMS”, http://www.amsterdamoptimization.com/pdf/lineq.pdf. Has an example demonstrating some techniques relating to Input-Output analysis.

Wednesday, September 14, 2016

A New Architecture for Optimization Modeling Frameworks

http://arxiv.org/pdf/1609.03488.pdf

image

AbstractWe propose a new architecture for optimization modeling frameworks in which solvers are expressed as computation graphs in a framework like TensorFlow rather than as standalone programs built on a low-level linear algebra interface. Our new architecture makes it easy for modeling frameworks to support high performance computational platforms like GPUs and distributed clusters, as well as to generate solvers specialized to individual problems. Our approach is particularly well adapted to first-order and indirect optimization algorithms. We introduce cvxflow, an open-source convex optimization modeling framework in Python based on the ideas in this paper, and show that it outperforms the state of the art.

The issues addressed in this paper are typically not the ones I am struggling with most of the time. Once more an indication that “optimization modeling” has a very different meaning for different modelers. 

Monday, September 12, 2016

Getting rid of fly ash from coal-fired power plants

Fly ash is a by-product of coal burning power plants.

Fly ash at Cockenzie Power Station. Image by Richard Webb.

In (1) a model is presented where a group of six Chinese power plants tries to optimally sell and ship fly ash to different markets. The leftover fly ash needs to be disposed of at a certain cost. Typically disposal means using landfills. A well-known use of fly ash is as a replacement for portland cement in producing concrete. Fly ash can make the concrete stronger and more durable (2). The plants are operating under single ownership so they coordinate their actions: there is no competitive behavior between the plants and we are interested to optimize the total profit of the group.

Bricks made from Fly Ash. Image by Thamizhpparithi Maari.



The model and the paper are not very interesting. The model is rather simple, the English text in the paper is often difficult to follow and I believe the model is not correctly categorized as a Goal Programming and Scheduling Model (it is neither a scheduling model nor a goal programming model). The LP model in (1) is just:

image

Here \(x_{i,j}\) is a non-negative variable indicating the optimal shipping patterns from plant \(i\) to market \(j\), and \(y_i\) is the amount of fly ash we needed to dispose of. The total output of each plant is \(q_i\). Note that market price \(p_j\) and the demand are given. This is essentially a transportation model plus a little bit of logic to handle the split between selling and disposal. The results are:

image
image

This is where the paper (1) stops. However I think we can make something more interesting out of this.

Although the disposal cost are relatively small compared to the overall revenue and profit, an economist would tell you we can do better than this. By lowering the price we can increase demand and thus be able to reduce the amount we need to dispose of. In the following we will set up a scenario to see if this economist is correct. Let’s say that the demand responds to a price decrease according to a fixed price elasticity \(\eta<0\). I.e. we have:

\[\frac{\partial \ln d_j}{\partial \ln p_j} = \eta\]

An example demand curve can look like the picture below. Although the demand curve is expressing demand as a function of price: \(d = f(p)\), economists typically make graphs with the price \(p\) on the \(y\)-axis and the quantity \(d\) on the \(x\)-axis. 

image

We assume here a constant price elasticity. This will result in a nonlinear demand curve. In the model we can now use the variables \(p_j\) to calculate the demand:

\[d_j = \beta_j \cdot p_j^{\eta} \]

The constant \(\beta_j\) can be found by plugging in our original constants for the price and the demand (this is called “calibration”). I.e.:

\[\beta_j := \frac{d^0_j}{(p^0_j)^\eta}\]

Assuming we don’t want to increase prices on existing customers we have: \(p_i \le p^0_j\). Quite the opposite: we want to lower prices in exchange for additional demand. Let’s further assume \(\eta = –2\) i.e. a 1% decrease in price leads to a 2% increase in demand. The model now looks like:

image

Note that prices (and thus the derived demand) are now endogenous instead of exogenous. As this is now an NLP model I would recommend to set some initial values for variables \(p\) and \(x\) (these are the non-linear variables). The results are:

image

image

So I would recommend to send some salespeople to markets 4, 6 and 8 and see if in exchange for a price drop the buyers are willing to purchase more. As an incentive, some of the profit improvement can be used as a bonus for these sales teams.

Of course if markets start to trade between them, this scheme may need some refinement. Another question: what about this \(\eta=-2\) elasticity? What if \(\eta=-1\) or \(\eta=-0.5\)? I ran these cases, and for \(\eta=-1\) we get rid of a some but not all disposal. The profit increases by a smaller amount than with \(\eta=-2\). With \(\eta=-0.5\) we don’t see any change compared to the original model (we have the same amount of ash to dispose of, and obtain the same profit).

References
  1. Ming Fu, Lu Wang, Yong-Lan Xu, Dong-Xiao Niu, Tian-Nan Ma,"The Study of Fly Ash Scheduling Optimization Model in Coal-Fired Power Plant". In Steven Y. Liang, Energy and Mechanical Engineering, Proceedings of 2015 International Conference on Energy and Mechanical Engineering.
  2. http://flyash.com/about-fly-ash/

Thursday, September 8, 2016

StructJuMP

From https://github.com/StructJuMP/StructJuMP.jl:

StructJuMP

The StructJuMP package provides a parallel algebraic modeling framework for block structured optimization models in Julia. StructJuMP, originally known as StochJuMP, is tailored to two-stage stochastic optimization problems and uses MPI to enable a parallel, distributed memory instantiation of the problem. StructJuMP.jl is an extension of the JuMP.jl package, which is as fast as AMPL and faster than any other modeling tools such as GAMS and Pyomo (see this).

Nonlinear solvers

Problems modeled in StructJuMP models can be solved in parallel using the PIPS-NLP parallel optimization solver. In addition, StructJuMP models can be solved (in serial only) using Ipopt. The solver interface and glue code for PIPS-NLP and Ipopt are located at StructJuMPSolverInterface.

The readme is at: https://github.com/StructJuMP/StructJuMP.jl/blob/master/FAQ.md