Scipy.optimize.linear_sum_assignment
I have never used this routine [1], so lets give this a try. This is a Python implementation of the Hungarian method [2]. The call is exceedingly simple: just give it the cost matrix.
The algorithm allows unbalanced assignment problems by just specifying a rectangular cost matrix. Typically, unbalanced problems are transformed into balanced ones by adding source or destination nodes.
I believe the description is slightly inaccurate:
This would allow zero assignments, i.e. \(x_{i,j}=0\). It is not always easy to create a watertight English description.
Let's try a problem.
The algorithm allows unbalanced assignment problems by just specifying a rectangular cost matrix. Typically, unbalanced problems are transformed into balanced ones by adding source or destination nodes.
I believe the description is slightly inaccurate:
| Description from Scipy.org | 
This would allow zero assignments, i.e. \(x_{i,j}=0\). It is not always easy to create a watertight English description.
Let's try a problem.
1000x1000 dense assignment problem
When we feed a random problem with 1,000 sources and 1,000 destinations, we see the following timings:
We first read in the data (organized as lists). The second step is to convert this to a NumPy array (otherwise this is done automatically by linear_sum_assignment). Then we solve the problem. On my laptop this takes about 325 seconds.
We can formulate this problem as an LP:\[\begin{align}\min \> & \sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} = 1 && \forall i \\ & \sum_i x_{i,j} =1 && \forall j \\ & x_{i,j} \ge 0 \end{align}\]
When we feed this into Cplex we see:
 
We first read in the data (organized as lists). The second step is to convert this to a NumPy array (otherwise this is done automatically by linear_sum_assignment). Then we solve the problem. On my laptop this takes about 325 seconds.
We can formulate this problem as an LP:\[\begin{align}\min \> & \sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} = 1 && \forall i \\ & \sum_i x_{i,j} =1 && \forall j \\ & x_{i,j} \ge 0 \end{align}\]
When we feed this into Cplex we see:
| 
Parallel mode:
  deterministic, using up to 4 threads for concurrent optimization. 
Tried aggregator 1
  time. 
LP Presolve
  eliminated 2 rows and 1 columns. 
Reduced LP has
  1999 rows, 1000000 columns, and 1999000 nonzeros. 
Presolve time =
  5.95 sec. (770.92 ticks) 
Initializing dual
  steep norms . . . 
Iteration log . .
  . 
Iteration:     1  
  Dual objective     =             1.007485 
Iteration:  1113  
  Dual objective     =             1.661162 
Dual simplex
  solved model. 
LP status(1):
  optimal 
Cplex Time:
  14.91sec (det. 2309.10 ticks) | 
Cplex tries several LP algorithms in parallel (concurrent optimization). It turns out Dual Simplex wins. The total solution time is about 15 seconds. So Cplex is 20 times as fast.
We can actually improve on this a bit. This is a network problem, so we can use the Network solver. In addition, we disable the presolver (it is somewhat expensive and does little for this problem).
 
We can actually improve on this a bit. This is a network problem, so we can use the Network solver. In addition, we disable the presolver (it is somewhat expensive and does little for this problem).
| 
Extracted network
  with 2002 nodes and 1000001 arcs. 
Extraction time =
  0.63 sec. (49.15 ticks) 
Iteration log . .
  . 
Iteration:     0  
  Infeasibility     =          2000.000000 (2000) 
Iteration:
  10000   Infeasibility     =             0.000000 (7.62808) 
Iteration:
  20000   Objective         =             5.108089 
Iteration:
  30000   Objective         =             3.842909 
Iteration:
  40000   Objective         =             3.147366 
Iteration:
  50000   Objective         =             2.622157 
Iteration:
  60000   Objective         =             2.275971 
Iteration:
  70000   Objective         =             1.990956 
Iteration:
  80000   Objective         =             1.790689 
Elapsed time =
  1.33 sec. (82.13 ticks) 
Iteration:
  90000   Objective         =             1.684441 
Network -
  Optimal:  Objective =    1.6701729950e+00 
Network time =
  1.67 sec. (119.82 ticks)  Iterations =
  92769 (10000) 
LP status(1):
  optimal 
Cplex Time:
  2.66sec (det. 236.01 ticks) | 
Now we need only less than 3 seconds (Cplex is 120 times as fast as Scipy).
We would normally expect a specialized assignment solver to do better than a general purpose LP solver. However, Cplex is a highly polished product and we run on the bare metal. Using Python and a simpler implementation just causes us to lose. Note that there are specialized solvers that are much faster.
Some solvers in Scipy are a little bit underwhelming. Examples are the linear programming simplex solver and, as we saw here, the assignment problem solver.
Some solvers in Scipy are a little bit underwhelming. Examples are the linear programming simplex solver and, as we saw here, the assignment problem solver.
References
- scipy.optimize.linear_sum_assignment, https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html
- Harold W. Kuhn. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly, 2:83-97, 1955.
 
 
Thanks for the post and comparison! I got pointed to this by a colleague, and it's an interesting problem.
ReplyDeleteNote that the performance of `scipy.optimize.linear_sum_assignment` will likely increase greatly in near future: There's appetite for adopting a C++ implementation [0] of Jonker--Volgenant, and as of very recently, an open pull request doing so [1]. How great the performance improvement is will depend on characteristics of the input, but for dense ones, the benchmarks hint at something like a 400x improvement over the existing implementation.
I played around a bit with a naive FICO Xpress implementation to see how that would compare, and here it seems that the improvements are less than what you're seeing with CPLEX (to some extent due to overhead in using the Python interface). [2]
[0]: https://github.com/scipy/scipy/pull/9800
[1]: https://github.com/scipy/scipy/pull/10296
[2]: https://gist.github.com/fuglede/d7bb1dcdf23669eacc7eef3256198789#file-lap-benchmark-ipynb
Always great to see the work of Roy Jonker and Ton Volgenant being used. I knew them back when I was a student.
DeleteLooks like the implementation actually only uses one of several steps of Jonker--Volgenant: Where they spend most of the time doing various clever initialization steps, the SciPy implementation jumps straight to augmentation based on an empty assignment. Still, this is significantly faster than what was already in place. And the improvement has now been merged and will be available in SciPy 1.4.0.
Delete