Saturday, July 31, 2021

A scheduling problem: a modeling approach

 Here is a simple scheduling problem [1]:

  • We have \(T\) time periods (weeks)
  • and \(N\) jobs to be scheduled.
  • Each job has a duration or processing time (again expressed in weeks), and a given resource (personnel) requirement during the execution of the job. This demand is expressed as: \(\color{darkblue}{\mathit{resource}}_{j,t}\)  where \(j\) indicates the job and \(t\) indicates the week since the start of the job.
  • Not in [1], but I added them to make things a bit more realistic: we have some release dates indicating that a job may not start before a given date. A reason can be that raw material is not available before that date.
  • Similarly, I added due dates: a job must finish before a given date.
  • The objective is to minimize the maximum amount of resources we need over the whole planning period. We can think of this as the capacity we need (e.g. number of personnel).
In this post, I want to show a standard approach to model this problem and opposed to that, how I would attack this.

Tuesday, July 27, 2021

A variant of an assignment problem

 In one [1], the following problem is sketched:

  • We have different types of workers. In the data set below, I call them type-A and type-B workers. 
  • We need to assign workers to jobs.
  • We have more workers than jobs, so we can execute all jobs. 
  • This is a standard assignment problem: \[\begin{align} \min &\sum_{w,j} \color{darkblue}c_{w,j} \color{darkred}x_{w,j}\\ & \sum_j \color{darkred}x_{w,j} \le 1 &&\forall w \\ & \sum_w \color{darkred}x_{w,j} = 1 && \forall j \\ & \color{darkred}x_{w,j} \in [0,1] \end{align}\]
  • The complication is that we can form teams of two workers, one of type A and one of type B. Such a team can work faster and execute the job in less time.
  • How to handle this?

I don't know how to handle this within an assignment problem, but stating this as a MIP allows us to make some progress. Allowing two persons to work on a job is easy. Just replace the second constraint by \[1 \le \sum_w \color{darkred}x_{w,j} \le 2\] Unfortunately this ignores that we need one worker of each type in a team. Also, this makes it difficult to impose a cost for each specific team.

Tuesday, July 20, 2021

Spatial Equilibrium

Just a quick experiment with a small price-endogenous spatial equilibrium model.

In this problem, we consider commodities (goods) produced and consumed in different regions. We can trade between regions.

This price-endogenous spatial model will find both equilibrium prices, supplies, demand quantities, and trade patterns. Our model will have a connection with the transportation model. The difference is that we formulate the model as a system of complementarity constraints (no objective), with the duals explicitly in the model as variables. Or, in other words, we solve the KKT conditions.

I use a small data set from [3], and see if we can reproduce things.

Monday, July 12, 2021

GAMS Singleton Set Issue

There is a nasty problem with the singleton set in GAMS. A singleton set is a set that can contain only zero or one element. Adding more elements implies replacement. I don't use it very often. Here is a reason why. 

Consider the small example:


set i /i1*i20/;
parameter p(i);
p(i) = uniform(0,10);

* find last element >= 5
scalar threshold /5/;
singleton set k(i);
loop(i,
  k(i)$(p(i)>=threshold) =
yes;
);

display p,k;


I would expect to see as output for k the last element in p(i) that is larger than 5. However, we see:


----     12 PARAMETER p  

i1  1.717,    i2  8.433,    i3  5.504,    i4  3.011,    i5  2.922,    i6  2.241,    i7  3.498,    i8  8.563
i9  0.671,    i10 5.002,    i11 9.981,    i12 5.787,    i13 9.911,    i14 7.623,    i15 1.307,    i16 6.397
i17 1.595,    i18 2.501,    i19 6.689,    i20 4.354


----     12 SET k  

                                                      ( EMPTY )


Monday, July 5, 2021

Another sports schedule problem

Wimbledon 2010 Mixed Doubles (source)

 

In [1] a question is asked about designing a sports schedule as follows:

  • There are 16 players, 7 rounds, and 4 games per round.
  • A game consists of 2 teams with 2 players per team.
  • Each player has a rating. A team rating is the sum of the ratings of the two members.
  • A player plays exactly one game in each round.
  • A player can be partnered with another player only once.
  • A player can play against another player only twice. 
  • Design a schedule that has the minimum difference in ratings between opposing teams.

I'll formulate two different models. One with a small number of binary variables. And one with a large number of binary variables. As we shall see the model with the larger number of variables will win.

Wednesday, June 30, 2021

Arranging points on a line

The problem stated in [1] is:



In the original problem, the poster talked about the distance between neighbors. But we don't know in advance what the neighboring points are. Of course, we can just generalize and talk about any two points. 

This problem looks deceivingly simple. The modeling has some interesting angles.

We will attack this problem in different ways:
  • An MINLP problem. The main idea is to get rid of the abs() function as this is non-differentiable.
  • A MIP model using binary variables or SOS1 variables. Again, we reformulate the abs() function.
  • Apply a fixing strategy to reduce the model complexity. Good solvers actually don't need this.
  • A genetic algorithm (ga) approach.

Saturday, June 12, 2021

Median, quantiles and quantile regression as linear programming problems

Quantile regression is a bit of an exotic type of regression [1,2,3]. It can be seen as a generalization of \(\ell_1\) or LAD regression [4], and just as LAD regression we can formulate and solve it as an LP.

First I want to discuss some preliminaries: how to find the median and the quantiles of a data vector \(\color{darkblue}y\). That will give us the tools to formulate the quantile regression problem as an LP. The reason for adding these preliminary steps is to develop some intuition about how Quantile Regression problems are defined. I found that most papers just "define" the underlying optimization problem, without much justification. I hope to show with these small auxiliary models how we arrive at the Quantile Regression model. Along the way, we encounter some interesting titbits. I'll discuss a few details that papers typically glance over or even skip, but I find fascinating.

Tuesday, June 8, 2021

Portfolio optimization with risk as constraint?

The standard mean-variance portfolio optimization models have the form:


Model M1
\[\begin{align}\min\>&\color{darkred}{\mathit{Risk}}=\color{darkred}x^T\color{darkblue}Q\color{darkred}x\\ & \color{darkblue}r^T\color{darkred}x \ge \color{darkblue}{\mathit{MinimumReturn}} \\  &\sum_i \color{darkred}x_i = 1\\ & \color{darkred}x \ge 0\end{align}\]


or may be:
Model M2
\[\begin{align}\min\>&\color{darkred}z=\color{darkred}x^T\color{darkblue}Q\color{darkred}x - \color{darkblue}\lambda \cdot\color{darkblue}r^T\color{darkred}x\\ &\sum_i \color{darkred}x_i = 1\\ & \color{darkred}x \ge 0\end{align}\]


where
  • \(\color{darkblue}Q\) is a variance-covariance matrix (we assume here it is positive semi-definite [1]),
  • \(\color{darkblue}\lambda\) is an exogenous constant that sometimes is varied to draw an efficient frontier, and
  • \(\color{darkblue}r\) are the returns.

Wednesday, June 2, 2021

Total Least Squares: nonconvex optimization

Total Least Squares (TLS) is an alternative for OLS (Ordinary Least Squares). It is a form of orthogonal regression and also deals with the problem of EIV (Errors-in-Variables). 


The standard OLS model is \[\color{darkblue}y = \color{darkblue}X\color{darkred}\beta + \color{darkred}\varepsilon\] where we minimize the sum-of-squares of the residuals \[\min ||\color{darkred}\varepsilon||_2^2\] We can interpret \(\color{darkred}\varepsilon\) as the error in \(\color{darkblue}y\).


In TLS, we also allow for errors in \(\color{darkblue}X\). The model becomes \[\color{darkblue}y+\color{darkred}\varepsilon=(\color{darkblue}X+\color{darkred}E)\color{darkred}\beta\] Note that we made a sign change in \(\color{darkred}\varepsilon\). This is pure aesthetics: to make the equation more symmetric looking. The objective is specified as \[\min \> ||\left(\color{darkred}\varepsilon \> \color{darkred}E\right)||_F\] i.e. the Frobenius norm of the matrix formed by \(\color{darkred}\varepsilon\) and \(\color{darkred}E\). The Frobenius norm is just \[||A||_F=\sqrt{\sum_{i,j}a_{i,j}^2}\] We can drop the square root from the objective (the solution will remain the same, but we got rid of a non-linear function with a possible problem near zero: the gradient is not defined there). The remaining problem is a non-convex quadratic problem which can be solved with global MINLP solvers such as Baron or with a global quadratic solver like Gurobi.

Sunday, May 23, 2021

Solving linear complementarity problems without an LCP solver

Introduction


The linear complementarity problem (LCP) can be stated as [1]: 


LCP Problem
\[\begin{align} &\text{Find $\color{darkred}x\ge 0,\color{darkred}w\ge 0$ such that:}\\ & \color{darkred}x^T\color{darkred}w = 0 \\ &\color{darkred}w =\color{darkblue}M  \color{darkred}x + \color{darkblue}q\end{align}\]


In many cases we require that \(\color{darkblue}M\) is positive definite.

The LCP has different applications, including in finance, physics, and economics [2]. Off-the-shelf LCP solvers are not widely available. Here I will show how to solve LCPs with other solvers. This question comes up now and then, so here is a short recipe post.

Monday, May 10, 2021

Clustering models

In Statistics (nowadays called Data Science or A.I. for public relations reasons), clustering is one of the most popular techniques available. Of course, nothing beats linear regression in the popularity contest. Here, I like to discuss two clustering models: \(k\)-means and \(k\)-medoids. For these models, there exist well-defined equivalent Mixed-Integer Programming models. In practice, they don't work very well except for small data sets. I think they are still useful to discuss for different reasons:

  • The formulations are interesting. They have some angles that may not be obvious at first.
  • They define the problem in a succinct way. Verbal descriptions are always fraught with imprecision and vagueness. A reference model can help to make things explicit. I.e. use a model as a documentation tool.
  • Not all data sets are large. For small data sets, we can prove optimality where the usual heuristics only can deliver good solutions, without much information about the quality of the solution. Obviously, clustering is often used in situations where ultimate optimality may not matter much, as it is frequently used as an exploratory tool.
  • We can adapt the model to special cases. Adding constraints such as a minimum and maximum number of points per cluster comes to mind [3].

Thursday, April 29, 2021

What is this BRATIO option in GAMS?

This is a note about BRATIO, a very exotic GAMS option related to providing an advanced basis to a Simplex-based LP (or NLP) solver. 

Tuesday, April 20, 2021

Inverting a matrix by LP

 The questions for this post are:

  • Can we model the inversion of a matrix as an LP problem? (A: yes)
  • Should we do that? (A: no)
  • If we insist, can we make it a bit quicker (A: yes)

Monday, April 19, 2021

Parallel machine scheduling II, three more formulations

In [1] two formulations were discussed for a scheduling problem with multiple machines. Here we add a few more. Some of them are a bit strange. None of them really works better than the ones in [1].

So this is a collection of formulations that may sound reasonable at first, but are really not suited for this problem. If you want to read about good models, skip this post.

Monday, April 5, 2021

This is really a modeling issue

 In [1], the following scenario is described:


  1. A MIP model is solved, delivering a feasible solution.
  2. All integer variables are fixed to their levels.
  3. The resulting LP model is solved in order to produce duals. This model turns out to be infeasible.

Monday, March 29, 2021

Mathematical Notation

I often tell beginners in optimization modeling that before starting to code, they should get a piece of paper and write down the mathematical model. There are several reasons for this:

Sunday, March 28, 2021

Parallel machine scheduling I, two formulations

Scheduling problems with multiple machines are not that easy for MIP solvers. However, modern solvers are capable to solve reasonably sized problems to optimality quite efficiently. Here are some notes.

Tuesday, March 23, 2021

SOS1 sets: when to use them

Short answer: almost never.


Beginners in optimization are always very excited when they read about SOS sets in the documentation. This looks like a very efficient tool to use for constructs like pick (at most) 1 out of \(n\) (SOS1) or for interpolation (SOS2). However, experienced modelers use SOS sets very sparingly. Why? Well, solvers like binary variables much better than SOS sets. Binary variables yield great bounding and cuts (there has been enormous progress in this area), while SOS constructs are really just about branching. 

Wednesday, March 17, 2021

MST + few Steiner points: convex quadratic model

In [1], I discussed a model for the Euclidean Streiner Tree problem. A non-convex integer programming model from the literature was reformulated into a MISOCP (not a version of a Japanese dish, but a Mixed-Integer Second-Order Cone Program). Together with a symmetry-breaking constraint and some high-performance solvers, this can lead to being able to solve larger models. 

One of the disadvantages of the model in [1] is that it only works with the full number of Steiner points \(n-2\) where \(n\) is the number of given (original) points. Here I want to discuss how we can use Minimum Spanning Tree (MST) models [2], as a building block for a model where we can add just a few Steiner points. One reason this is interesting: the first few Steiner points have the most impact on the objective.

Tuesday, March 16, 2021

Excel Never Dies



In a post by (famous economist) Paul Krugman:

I’m sometimes embarrassed at how much I use Excel. But it turns out to be a work of genius.

The online article he refers to is:

Packy McCormick, Ben Rollert, Excel Never Dieshttps://www.notboring.co/p/excel-never-dies

It is a nice read.

In my work, I see Excel intensively used by (almost) all clients I work with or talk to. One sometimes wonders: what would happen if Excel would just suddenly stop working everywhere? 

Monday, March 15, 2021

Euclidean Steiner tree problems as MISOCP

In [1] I looked at MIP models for the Minimum Spanning Tree problem. A related, but much more difficult problem is the Euclidean Steiner Tree Problem. Here we are allowed to add points to make the tree shorter. Here are two obvious examples [2]:




Friday, March 12, 2021

Monday, March 8, 2021

Minimum Spanning Trees in Math Programming Models

Algorithms for the Minimum Spanning Tree (MST) problem are readily available. However, sometimes we want to solve this problem inside a Mathematical Programming model. Usually, this is for two reasons:

  • We have some side constraints
  • Or as part of a larger model
These reasons are essentially the same (a matter of gradation). Embedding an MST inside a model is not totally trivial.


Data


I used in our models the data from [1]. This data set has distances for 42 US cities.


----    298 SET cities  

Manchester, N.H.    ,    Montpelier, Vt.     ,    Detroit, Mich.      ,    Cleveland, Ohio     ,    Charleston, W.Va.   
Louisville, Ky.     ,    Indianapolis, Ind.  ,    Chicago, Ill.       ,    Milwaukee, Wis.     ,    Minneapolis, Minn.  
Pierre, S.D.        ,    Bismarck, N.D.      ,    Helena, Mont.       ,    Seattle, Wash.      ,    Portland, Ore.      
Boise, Idaho        ,    Salt Lake City, Utah,    Carson City, Nevada ,    Los Angeles, Calif. ,    Phoenix, Ariz.      
Santa Fe, N.M.      ,    Denver, Colo.       ,    Cheyenne, Wyo.      ,    Omaha, Neb.         ,    Des Moines, Iowa    
Kansas City, Mo.    ,    Topeka, Kans.       ,    Oklahoma City, Okla.,    Dallas, Tex.        ,    Little Rock, Ark.   
Memphis, Tenn.      ,    Jackson, Miss.      ,    New Orleans, La.    ,    Birmingham, Ala.    ,    Atlanta, Ga.        
Jacksonville, Fla.  ,    Columbia, S.C.      ,    Raleigh, N.C.       ,    Richmond, Va.       ,    Washington, D.C.    
Boston, Mass.       ,    Portland, Me.       


The original data set is called dantzig42 from TSPLIB [8].

An optimal spanning tree can look like:


Monday, March 1, 2021

A difficult multi-line regression data set

 In [1], I discussed some models for a small 2 line regression problem. Here I want to dive into a more challenging problem.


Data generation process


I started with the following five lines: \[\begin{align}&f_1(x)=200-3x\\&f_2(x)=150-x\\&f_3(x)=10\\&f_4(x)=-50+2x\\&f_5(x)=-120\end{align}\] To generate points, I used \[\begin{align}&x_i \sim U(0,100)\\ & k_i \sim U\{1,5\} \\ & \epsilon_i \sim N(0,15)\\ & y_i = a_{k_i} + b_{k_i} \cdot x_i + \epsilon_i \end{align}\]

This corresponds to the following picture:



Thursday, February 25, 2021

A strange regression problem: multiple-line fitting

Consider the following data:


----     14 PARAMETER x  data

i1  17.175,    i2  84.327,    i3  55.038,    i4  30.114,    i5  29.221,    i6  22.405,    i7  34.983,    i8  85.627
i9   6.711,    i10 50.021,    i11 99.812,    i12 57.873,    i13 99.113,    i14 76.225,    i15 13.069,    i16 63.972
i17 15.952,    i18 25.008,    i19 66.893,    i20 43.536,    i21 35.970,    i22 35.144,    i23 13.149,    i24 15.010
i25 58.911,    i26 83.089,    i27 23.082,    i28 66.573,    i29 77.586,    i30 30.366,    i31 11.049,    i32 50.238
i33 16.017,    i34 87.246,    i35 26.511,    i36 28.581,    i37 59.396,    i38 72.272,    i39 62.825,    i40 46.380
i41 41.331,    i42 11.770,    i43 31.421,    i44  4.655,    i45 33.855,    i46 18.210,    i47 64.573,    i48 56.075
i49 76.996,    i50 29.781


----     14 PARAMETER y  data

i1    32.320,    i2   238.299,    i3    87.081,    i4    17.825,    i5   -10.154,    i6   -35.813,    i7    14.506
i8   204.903,    i9  -103.014,    i10   79.521,    i11 -153.679,    i12  -47.971,    i13  277.483,    i14  160.084
i15   28.687,    i16  133.650,    i17  -63.820,    i18    4.028,    i19  145.434,    i20   48.326,    i21   49.295
i22   35.478,    i23  -58.602,    i24   34.463,    i25  -50.666,    i26 -115.522,    i27  -19.540,    i28  134.829
i29  198.074,    i30   -7.351,    i31   40.786,    i32   82.107,    i33   27.661,    i34  -82.669,    i35    5.081
i36   -1.278,    i37  118.345,    i38  172.905,    i39  -57.943,    i40   56.981,    i41  -45.248,    i42   34.517
i43   11.780,    i44  -98.506,    i45   -1.064,    i46   33.323,    i47  139.050,    i48  -35.595,    i49 -102.580
i50    3.912


When we plot this, we see:



Saturday, February 20, 2021

2d bin packing with Google OR-Tools CP-SAT, how to fix?

In [1] I looked at some MIP formulations for a 2d bin packing problem. OR-Tools has a nice model.AddNoOverlap2D constraint function. So, I was excited to try this out. (In these times I get quickly enthusiastic about these things). 

The idea is as follows. ortools has this construct Constraint.OnlyEnforce If(boolean). This is essentially an indicator constraint: \((\color{darkred}x\ge 5)\text{.OnlyEnforceIf($\color{darkred}\delta$)}\) means \[\color{darkred}\delta=1 \Rightarrow \color{darkred}x \ge 5\] This inspires me to develop a model along the lines of: \[\begin{align} \min& \sum_k \color{darkred}u_k\\  & \color{darkred}\beta_{i,j} = 0 \Rightarrow \color{darkred}b_i\ne\color{darkred}b_j \\ & \color{darkred}\beta_{i,j} = 1 \Rightarrow \mathbf{NoOverlap2D}(i,j) \\ & \color{darkred}u_k = 0 \Rightarrow \color{darkred}b_i \lt k \end{align}\] Here \(\color{darkred}b_i\) is the bin number where item \(i\) is placed and \(\color{darkred}\beta_{i,j}\) is true if  \(\color{darkred}b_i=\color{darkred}b_j\).


OK, so I started coding:

Thursday, February 18, 2021

An easy variant of 2d bin packing

 In [1] the following problem is posted:

Consider the 2d bin packing problem [2]. Now we have the following additional restrictions: items of a different type or category cannot be below or above each other. In other words, we arrange items of the same type in columns. 

We use again our small data set: 


----     37 PARAMETER itemSize  sizes of bin and items

               w           h

cat1       7.000      12.000
cat2       9.000       3.000
cat3       5.000      14.000
cat4      13.000       9.000
cat5       6.000       8.000
cat6      20.000       5.000
bin       40.000      60.000


----     37 SET itemMap  mapping between items and categories

item1 .cat1,    item2 .cat1,    item3 .cat1,    item4 .cat1,    item5 .cat1,    item6 .cat1,    item7 .cat1
item8 .cat1,    item9 .cat1,    item10.cat1,    item11.cat2,    item12.cat2,    item13.cat2,    item14.cat2
item15.cat2,    item16.cat2,    item17.cat2,    item18.cat2,    item19.cat2,    item20.cat2,    item21.cat3
item22.cat3,    item23.cat3,    item24.cat3,    item25.cat3,    item26.cat3,    item27.cat3,    item28.cat3
item29.cat3,    item30.cat3,    item31.cat4,    item32.cat4,    item33.cat4,    item34.cat4,    item35.cat4
item36.cat4,    item37.cat4,    item38.cat4,    item39.cat4,    item40.cat4,    item41.cat5,    item42.cat5
item43.cat5,    item44.cat5,    item45.cat5,    item46.cat6,    item47.cat6,    item48.cat6,    item49.cat6
item50.cat6


I'll try two different models:

  • Assign items to bins, arrange them in columns
  • Assign columns to bins

Tuesday, February 16, 2021

2d Bin Packing

 In [1], a question was posed about a particular type of 2d bin-packing. This is a good reason to play with some models.

First I discuss three formulations with continuous no-overlap constraints and one for discrete data. In the next post, I'll discuss the variant described by the post [1]. This actually leads to a very different, but easy problem.


Problem statement and example data


We consider \(n=50\) rectangular items of different sizes that have to be placed into bins. There are six different categories, each with a different size (height and width). All the bins have the same size. The goal is to use as few bins as possible.


----     37 PARAMETER itemSize  sizes of bin and items

               w           h

cat1       7.000      12.000
cat2       9.000       3.000
cat3       5.000      14.000
cat4      13.000       9.000
cat5       6.000       8.000
cat6      20.000       5.000
bin       40.000      60.000


----     37 SET itemMap  mapping between items and categories

item1 .cat1,    item2 .cat1,    item3 .cat1,    item4 .cat1,    item5 .cat1,    item6 .cat1,    item7 .cat1
item8 .cat1,    item9 .cat1,    item10.cat1,    item11.cat2,    item12.cat2,    item13.cat2,    item14.cat2
item15.cat2,    item16.cat2,    item17.cat2,    item18.cat2,    item19.cat2,    item20.cat2,    item21.cat3
item22.cat3,    item23.cat3,    item24.cat3,    item25.cat3,    item26.cat3,    item27.cat3,    item28.cat3
item29.cat3,    item30.cat3,    item31.cat4,    item32.cat4,    item33.cat4,    item34.cat4,    item35.cat4
item36.cat4,    item37.cat4,    item38.cat4,    item39.cat4,    item40.cat4,    item41.cat5,    item42.cat5
item43.cat5,    item44.cat5,    item45.cat5,    item46.cat6,    item47.cat6,    item48.cat6,    item49.cat6
item50.cat6


Furthermore, we assume items cannot be rotated.

Friday, February 5, 2021

Non-differentiable NLP vs LP

 In [1] the following simple problem is posed:


Given data 

p = [50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61]

try to track this as close as possible by minimizing the largest deviation while obeying the restrictions:


\[\begin{align}&\color{darkred}x_1 = \color{darkblue}p_1\\ &\color{darkred}x_i - \color{darkred}x_{i-1}\le 1.5\end{align}\]


Non-differentiable NLP model


An obvious way to formulate this is: \[\begin{align} \min&\max_i |\color{darkred}x_i-\color{darkblue}p_i| \\ &\color{darkred}x_1 = \color{darkblue}p_1\\ &\color{darkred}x_i - \color{darkred}x_{i-1}\le 1.5\end{align}\]

This is a non-differentiable formulation, and we are asking for trouble when trying to solve this using a standard nonlinear NLP solver. For instance, when I solve this with CONOPT, we may see:


               S O L V E      S U M M A R Y

     MODEL   m1                  OBJECTIVE  z
     TYPE    DNLP                DIRECTION  MINIMIZE
     SOLVER  CONOPT              FROM LINE  25

**** SOLVER STATUS     4 Terminated By Solver      
**** MODEL STATUS      7 Feasible Solution         
**** OBJECTIVE VALUE                1.2500

 RESOURCE USAGE, LIMIT          0.062 10000000000.000
 ITERATION COUNT, LIMIT        31    2147483647
 EVALUATION ERRORS              0             0
CONOPT 3         33.2.0 r4f23b21 Released Dec 01, 2020 WEI x86 64bit/MS Window
 
 
    C O N O P T 3   version 3.17L
    Copyright (C)   ARKI Consulting and Development A/S
                    Bagsvaerdvej 246 A
                    DK-2880 Bagsvaerd, Denmark
 
 
    The model has 11 variables and 10 constraints
    with 29 Jacobian elements, 10 of which are nonlinear.
    The Hessian of the Lagrangian has 10 elements on the diagonal,
    45 elements below the diagonal, and 10 nonlinear variables.
 
                   Pre-triangular equations:   0
                   Post-triangular equations:  10
 
 
 ** Feasible solution. Convergence too slow. The change in objective
    has been less than 3.7500E-12 for 20 consecutive iterations


Conopt realizes it has some troubles and declares the solution as just feasible instead of optimal. Indeed, we can do better than an objective of 1.25. Other NLP solvers may not even know they are in trouble and just report a solution and declare it (locally) optimal. 

There is a big problem with this model. Most NLP solvers assume the objective and non-linear constraints are smooth (i.e. differentiable). In this case, we don't have this, and this lack of smoothness is often a recipe for disaster. This is a nice example of this. 

I often complain about non-differentiability in models I see on forums. Often the reaction is: "Who cares?". Indeed, often NLP solvers will deliver some solution without warning. This post is about a small problem, where we really can do much better.   

Note that the initial point also plays a role here. With a different starting point, you may find different solutions.

A global NLP solver may help for this model. Some of the ones I tried, don't have support for the max() function, so we need to reformulate the model. When doing that, we can actually do a bit better and form a linear programming problem.

Wednesday, February 3, 2021

Dijkstra confusion

 Dijkstra is a famous algorithm for the shortest path problem. The original paper is from 1959 [1]:


(The format of this paper is interesting: almost no mathematical notation).  Dijkstra's algorithm is not suited for all shortest path problems. Negative weights or lengths can cause problems. There are basically three different reasons I see mentioned:

  1. Dijkstra just does not work correctly if there are negative weights.
  2. Dijkstra does not work if there are negative cycles in the graph. This would imply that a problem with negative weights would work as long as there are no negative cycles. 
  3. Keep things vague and mumble a bit about negative weights and negative cycles.

I notice that option 3 seems very popular. As a result, looking at questions about this on stackexchange, I see there is much confusion about this. Also, I see vague, misleading messages in documentation and error/warning messages. For instance, let's have a look at [2]:

Also, this routine does not work for graphs with negative distances. Negative distances can lead to infinite cycles that must be handled by specialized algorithms such as Bellman-Ford’s algorithm or Johnson’s algorithm.

Well, this is indeed sufficiently vague. We deal with option 3: we don't know if this is referring to reason 1 or 2. The same function that this documentation is about can produce the following warning:

UserWarning: Graph has negative weights: dijkstra will give inaccurate results if the graph contains negative cycles. Consider johnson or bellman_ford. distmat,pred = scipy.sparse.csgraph.dijkstra(spmat,indices=source,return_predecessors=True)

This seems to say we only need to worry about graphs with negative cycles. So that would be option 2. 


A small counter example


In [3] a small acyclic graph (a DAG: Directed Acyclic Graph) is presented:


When we try to find the shortest path from n0 to n2, using [2], we can see:

dijkstra
shortest path length: 1.0
johnson
shortest path length: -8.0
bellman_ford
shortest path length: -8.0
<ipython-input-25-7a1de3894d98>:13: UserWarning: Graph has negative weights: dijkstra will give inaccurate results if the graph contains negative cycles. Consider johnson or bellman_ford.
  distmat,pred = scipy.sparse.csgraph.dijkstra(spmat,indices=source,return_predecessors=True)


This example disproves that we just have to watch out for negative cycles.  That means the documentation and warning messages should state clearly that the function scipy.sparse.csgraph.dijkstra does not work correctly with negative weights, irrespective of whether there are negative cycles.

Thursday, January 28, 2021

Assignment problem with a wrinkle formulated as a network problem

In a previous post [1] I discussed a simple problem. But it turned out not so easy to solve for some of the larger data sets. Basically, it was an assignment problem with an extra condition. The problem was a follows:

Consider two arrays \(\color{darkblue}a_i\) (length \(\color{darkblue}m\)) and \(\color{darkblue}b_j\) (length \(\color{darkblue}n\)) with \(\color{darkblue}m \lt \color{darkblue}n\). Assign all values \(\color{darkblue}a_i\) to a \(\color{darkblue}b_j\) such that:

  • Each \(\color{darkblue}b_j\) can have 0 or 1 \(\color{darkblue}a_i\) assigned to it.
  • The assignments need to maintain the original order of \(\color{darkblue}a_i\). I.e. if \(\color{darkblue}a_i \rightarrow \color{darkblue}b_j\) then \(\color{darkblue}a_{i+1}\) must be assigned to a slot in \(\color{darkblue}b\) that is beyond slot \(j\). In the picture below that means that arrows cannot cross.
  • Do this while minimizing the sum of the products.


In [1], I attacked this as a mixed-integer programming problem. In this post, I want to see if we can solve this as a network problem. This was largely inspired by the comments in [1].


Shortest path algorithms and models


The shortest path problem can be solved in different ways. Dijkstra's algorithm is a popular choice due to its simplicity and speed. In [2] we can read a quote by Edsger Dijkstra:

What is the shortest way to travel from Rotterdam to Groningen, in general: from given city to given city. It is the algorithm for the shortest path, which I designed in about twenty minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path. As I said, it was a twenty-minute invention. In fact, it was published in '59, three years later. The publication is still readable, it is, in fact, quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. I learned later that one of the advantages of designing without pencil and paper is that you are almost forced to avoid all avoidable complexities. Eventually, that algorithm became to my great amazement, one of the cornerstones of my fame.
For an early implementation in a computer program, Dijkstra used a set of 64 nodes corresponding to Dutch cities. 64 was chosen so the node number could be encoded in 6 bits [2]. 

The shortest path can also be formulated as an LP problem. Standard LP solvers can handle quite large problems (each node becomes a constraint, and each arc a variable). In addition, a solver like Cplex contains a specialized network solver for such LPs.  

Graph 


We start with the nodes. We denote the nodes by \(\color{darkblue}n_{i,j}\) representing: \(\color{darkblue}a_i\) is assigned to \(\color{darkblue}b_j\). Not all assignments are possible. For instance, we cannot assign \(\color{darkblue}a_2\) to \(\color{darkblue}b_1\). That means: node  \(\color{darkblue}n_{2,1}\)  does not exist.

We also need a source node and a sink node.

The arcs indicate: after assigning \(\color{darkblue}a_i \rightarrow \color{darkblue}b_j\) we need to assign the next item \(\color{darkblue}a_{i+1}\rightarrow \color{darkblue}b_{j+k}\) for some \(k\ge 1\). In addition we need to connect the source node to all nodes with \(i=1\) and the sink node to all nodes with \(i=3\) (our last \(\color{darkblue}a_i\)).  So our network looks like:

Network representation


Note how any arc not connected to the sink or source node, goes to the right and downwards.

We have costs for visiting each node: \(\color{darkblue}a_i\cdot\color{darkblue}b_j\).  As we want to formulate this as a shortest path problem, we need to allocate these costs to arcs. I used the incoming arcs for this. So any arc \(e_{i,j,i',j'}\) gets a cost \(\color{darkblue}a_{i'}\cdot\color{darkblue}b_{j'}\).  The arcs to the sink node have a zero cost. 

Note: this graph is acyclic, so negative arc lengths are ok. We will never see negative cycles.

As you can see: because the nodes have two indices, the arcs have four! That will be fun.

Wednesday, January 27, 2021

An assignment problem with a wrinkle

In [1], the following problem is posed:

Consider two arrays \(\color{darkblue}a_i\) (length \(\color{darkblue}m\)) and \(\color{darkblue}b_j\) (length \(\color{darkblue}n\)) with \(\color{darkblue}m \lt \color{darkblue}n\). Assign all values \(\color{darkblue}a_i\) to a \(\color{darkblue}b_j\) such that:

  • Each \(\color{darkblue}b_j\) can have 0 or 1 \(\color{darkblue}a_i\) assigned to it.
  • The assignments need to maintain the original order of \(\color{darkblue}a_i\). I.e. if \(\color{darkblue}a_i \rightarrow \color{darkblue}b_j\) then \(\color{darkblue}a_{i+1}\) must be assigned to a slot in \(\color{darkblue}b\) that is beyond slot \(j\). In the picture below that means that arrows cannot cross.
  • Do this while minimizing the sum of the products.


Mixed-integer programming model

This problem can be viewed as an assignment problem with a side constraint:


MIP Model
\[\begin{align}\min& \sum_{i,j}\color{darkred}x_{i,j}\cdot\color{darkblue}a_i\cdot\color{darkblue}b_j \\ &\sum_j \color{darkred}x_{i,j}=1 &&\forall i\\ & \sum_i \color{darkred}x_{i,j}\le 1 &&\forall j\\ & \color{darkred}v_i = \sum_j j \cdot \color{darkred} x_{i,j}\\ & \color{darkred}v_i \ge \color{darkred}v_{i-1}+1\\ & \color{darkred}x_{i,j} \in \{0,1\} \\ & \color{darkred}v_i \ge 1\end{align}\]


Here \(\color{darkred}x_{i,j}\) indicates the assignment \(\color{darkblue}a_i \rightarrow \color{darkblue}b_j\). The variable \(\color{darkred}v_i\) represents the position in \(\color{darkblue}b\) to which \(\color{darkblue}a_i\) is assigned.