In a previous post [1] I discussed a simple problem. But it turned out not so easy to solve for some of the larger data sets. Basically, it was an assignment problem with an extra condition. The problem was a follows:
Consider two arrays \(\color{darkblue}a_i\) (length \(\color{darkblue}m\)) and \(\color{darkblue}b_j\) (length \(\color{darkblue}n\)) with \(\color{darkblue}m \lt \color{darkblue}n\). Assign all values \(\color{darkblue}a_i\) to a \(\color{darkblue}b_j\) such that:
- Each \(\color{darkblue}b_j\) can have 0 or 1 \(\color{darkblue}a_i\) assigned to it.
- The assignments need to maintain the original order of \(\color{darkblue}a_i\). I.e. if \(\color{darkblue}a_i \rightarrow \color{darkblue}b_j\) then \(\color{darkblue}a_{i+1}\) must be assigned to a slot in \(\color{darkblue}b\) that is beyond slot \(j\). In the picture below that means that arrows cannot cross.
- Do this while minimizing the sum of the products.
In [1], I attacked this as a mixed-integer programming problem. In this post, I want to see if we can solve this as a network problem. This was largely inspired by the comments in [1].
Shortest path algorithms and models
The shortest path problem can be solved in different ways. Dijkstra's algorithm is a popular choice due to its simplicity and speed. In [2] we can read a quote by Edsger Dijkstra:
What is the shortest way to travel from Rotterdam to Groningen, in general: from given city to given city. It is the algorithm for the shortest path, which I designed in about twenty minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path. As I said, it was a twenty-minute invention. In fact, it was published in '59, three years later. The publication is still readable, it is, in fact, quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. I learned later that one of the advantages of designing without pencil and paper is that you are almost forced to avoid all avoidable complexities. Eventually, that algorithm became to my great amazement, one of the cornerstones of my fame.
For an early implementation in a computer program, Dijkstra used a set of 64 nodes corresponding to Dutch cities. 64 was chosen so the node number could be encoded in 6 bits [2].
The shortest path can also be formulated as an LP problem. Standard LP solvers can handle quite large problems (each node becomes a constraint, and each arc a variable). In addition, a solver like Cplex contains a specialized network solver for such LPs.
Graph
We start with the nodes. We denote the nodes by \(\color{darkblue}n_{i,j}\) representing: \(\color{darkblue}a_i\) is assigned to \(\color{darkblue}b_j\). Not all assignments are possible. For instance, we cannot assign \(\color{darkblue}a_2\) to \(\color{darkblue}b_1\). That means: node \(\color{darkblue}n_{2,1}\) does not exist.
We also need a source node and a sink node.
The arcs indicate: after assigning \(\color{darkblue}a_i \rightarrow \color{darkblue}b_j\) we need to assign the next item \(\color{darkblue}a_{i+1}\rightarrow \color{darkblue}b_{j+k}\) for some \(k\ge 1\). In addition we need to connect the source node to all nodes with \(i=1\) and the sink node to all nodes with \(i=3\) (our last \(\color{darkblue}a_i\)). So our network looks like:
 |
Network representation |
Note how any arc not connected to the sink or source node, goes to the right and downwards.
We have costs for visiting each node: \(\color{darkblue}a_i\cdot\color{darkblue}b_j\). As we want to formulate this as a shortest path problem, we need to allocate these costs to arcs. I used the incoming arcs for this. So any arc \(e_{i,j,i',j'}\) gets a cost \(\color{darkblue}a_{i'}\cdot\color{darkblue}b_{j'}\). The arcs to the sink node have a zero cost.
Note: this graph is acyclic, so negative arc lengths are ok. We will never see negative cycles.
As you can see: because the nodes have two indices, the arcs have four! That will be fun.