I am a full-time consultant and provide services related to the design, implementation and deployment of mathematical programming, optimization and data-science applications. I also teach courses and workshops. Usually I cannot blog about projects I am doing, but there are many technical notes I'd like to share. Not in the least so I have an easy way to search and find them again myself. You can reach me at erwin@amsterdamoptimization.com.
Tuesday, January 19, 2021
Submatrix with Largest Sum
Problem
Given a data matrix \(\color{darkblue}a_{i,j}\), find a submatrix such that the sum of the elements is maximized. The term "submatrix" can indicate a contiguous or a non-contiguous subset of rows and columns.
This is a generalization of the 1d maximum subarray problem [1].
Looks easy, but let's make sure it is.
Example data
Of course, the problem is trivial if all elements are non-negative: just select the whole matrix. It is only interesting if there are negative elements. So here is my random matrix:
The values are drawn from a uniform distribution, \(\color{darkblue}a_{i,j} \sim U(-10,10)\).
Non-contiguous submatrix
The non-convex MIQP formulation is easy. Just introduce binary variables that indicate if a row or column is selected. A cell is selected if both its corresponding row and column are selected.
This is a non-convex problem, but Cplex solves it very fast: just 982 simplex iterations and 0 branch-and-bound nodes. The reason is: Cplex will linearize this problem. Of course, we can also do this linearization ourselves:
The problem is also known as the finding the maximum edge biclique in bipartite graphs.
This model solves in 577 Simplex iterations and 0 nodes. This is even a bit faster than the automatically reformulated version. Sometimes we can beat Cplex! (Note that is just luck: Cplex is doing something like Model C which is most of the time better).
The variables \(\color{darkred}s_{i,j}\) can be declared to be binary, or they can be relaxed to be continuous between 0 and 1.
If \(\color{darkred}s_{i,j}\) is relaxed, Cplex will make them binary again. We can see this in the log: "Reduced MIP has 40 binaries, 0 generals, 0 SOSs, and 0 indicators" is followed by "Reduced MIP has 340 binaries, 0 generals, 0 SOSs, and 0 indicators".
In many models, we can drop either the \(\le\) constraints or the \(\ge\) constraint of the linearization. Here we need them both.
Well, that is not completely true. We can drop \(\color{darkred}s_{i,j} \le \color{darkred}x_i\) and \(\color{darkred}s_{i,j} \le \color{darkred}y_i\) if \(\color{darkblue}a_{i,j}\le 0\). And we can drop \(\color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1\) if \(\color{darkblue}a_{i,j}\ge 0\). Indeed, we can drop all constraints if \(\color{darkblue}a_{i,j}= 0\). If we apply this to our model, we reduce the number of constraints from 900 to 433. Interestingly the number of Simplex iterations goes up to 862 (from 577). The number of nodes stays 0.
A different linearization is proposed in [4]. First row sums are calculated: \[\color{darkred}r_i = \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}y_j\] Then the objective \[\max \> \sum_i \color{darkred}r_i \cdot \color{darkred} x_i \] is linearized. This is done by observing that only positive row sums \(\color{darkred}r_i\) are ever selected. So the objective is rewritten as \[\max \> \sum_i \color{darkred}r_i^+\] where \(\color{darkred}r_i^+ = \max(0,\color{darkred}r_i)\). Linearization of this expression leads to the model:
The first two constraints implement the implications: \[\begin{align} & \color{darkred}x_i=0 \Rightarrow \color{darkred} r_i^+ = 0 \\ & \color{darkred}x_i=1 \Rightarrow \color{darkred} r_i^+ = \color{darkred} r_i\end{align}\] Good values for the big-M constants are given in [4]: \[\begin{align} &\color{darkblue}M_{1,i} = \max(0,\color{darkblue}a_{i,j}) \\ &\color{darkblue}M_{2,i} = -\min(0,\color{darkblue}a_{i,j})\end{align}\] The advantage of this model is that we need only vectors of decision variables. There is no variable matrix (in the other models we have \(s_{i,j}\), either directly or indirectly). A disadvantage is that the model is a bit more complex and seems somewhat slower to solve.
Contiguous submatrix
If we want to select an optimal contiguous submatrix, we can use a modeling trick. This is borrowed from machine scheduling where we sometimes want to limit the number of start-ups. An example can be that manufacturers require such limitations to reduce wear and tear on equipment like generators. For both \( \color{darkred}x_i, \color{darkred}y_j\) we require that they can go from 0 to 1 just once. For this, we introduce binary variables: \[\begin{aligned} &\color{darkred}p_i = \begin{cases} 1 & \text{if row $i$ is the start of the selected submatrix}\\ 0 & \text{otherwise}\end{cases} \\ & \color{darkred}q_j = \begin{cases} 1 & \text{if column $j$ is the start of the selected submatrix}\\ 0 & \text{otherwise}\end{cases}\end{aligned}\] The implications \(\color{darkred}x_{i-1}=0\textbf{ and }\color{darkred}x_{i}=1 \Rightarrow \color{darkred}p_i = 1\) can be implemented as \( \color{darkred}p_i \ge \color{darkred}x_{i} - \color{darkred}x_{i-1}\). We can only have one \(\color{darkred}p_i =1\). Similar for other direction: \(\color{darkred}q_j\). So the model can look like:
This model is still solved completely during preprocessing: zero nodes were needed.
We can relax \(\color{darkblue}p\) and \(\color{darkblue}q\) to be continuous between 0 and 1.
Making \(\sum_i \color{darkred}p_i\le 1\) an equality \(\sum_i \color{darkred}p_i= 1\) does not really do much: we can still have zero selected rows, due to how the bounding works.
For small data sets, it is easy to enumerate all possible submatrices. For our data set, we have 25,575 of them (excluding the special case of no selected rows and columns).
We can use the same linearization techniques that were discussed before.
These models seem to solve very quickly. However, if you think we can solve big problems with these formulations, you are wrong. Even slightly larger problems show much larger solution times.
Comparison
Here we solve some larger problems. Our models are:
Non-contiguous MIQP model A. Let Cplex linearize.
Non-contiguous MIP model B. Linearized while keeping all constraints.
Non-contiguous MIP model C. Linearize with dropping unnecessary constraints.
Non-contiguous MIP model D. Alternative formulation from [4].
Contiguous MIQP model E. Let Cplex linearize.
Model A
Model B
Model C
Model D
Model E
Model E
Problem
non-contiguous
non-contiguous
non-contiguous
non-contiguous
contiguous
contiguous
Problem size (rows/columns)
50/30
50/30
50/30
50/30
50/30
100/100
Constraints/variables (generated)
0/80
4,500/1,580
2,237/1,580
150/180
82/160
202/400
Constraints/variables (after presolve)
2,237/1,580
4,500/1,580
2,237/1,580
100/130
2,317/1,658
15,198/10,398
Objective
784.904
784.904
784.904
784.904
298.447
826.110
Gap
optimal
optimal
optimal
32%
optimal
optimal
Time (seconds)
1,113
1,072
376
> 3600
3
951
Nodes
199,525
77,967
5,273
3,034,448
0
8,174
Simplex iterations
32,497,491
17,956,170
5,569,196
104,396,873
2,519
3,654,174
Notes:
The contiguous problem, as expected, is much easier to solve.
As always: these MIP models find good and even optimal solutions relatively quickly. Most of the time is spent proving optimality.
For the contiguous problem, there are well-known dynamic programming algorithms [2,3]. Not sure if there is one for the non-contiguous case.
For more on the non-contiguous case, see [4].
The above timings are on a slow laptop, so you can expect better performance on a more beefy machine.
Model D performs much better on the data sets shown in [4]. My random data makes the problem very difficult: combinatorially challenging. Practical data sets may show natural "hills" and "valleys", which can make the problem much easier.
It is quite possible that formulation D works well when the problem is very large and easy. It does not need a matrix of decision variables, which may be beneficial in that case.
Conclusions
Using random data, we can easily generate some challenging data sets.
Solvers like Cplex and Gurobi may automatically reformulate non-convex MIQP models into linear models. I like that: we can use quadratic forms to simplify models and not worry about tedious reformulations that move us away from more natural forms of expressions of the problem. In a sense, quadratic terms look like indicator constraints. The underlying solver engine may not even know how to deal with this, but automatically applied reformulations make it look it does. In theory, solvers should be in a better position to apply the most optimal reformulations for a model than a modeler.
In a subsequent post, I will discuss a generalization of this problem. Instead of matrix elements, we consider points on the 2d plane, each having a value. Find the rectangle such that the sum of the values of the points inside the rectangle is maximized.
Select
non-contiguous submatrix that maximizes the sum of the
values.
$offtext
*-------------------------------------------------------------- * data * slightly smaller than in blog post *-------------------------------------------------------------- set i 'rows' /r1*r30/ j 'columns' /c1*c30/
;
Select
contiguous submatrix that maximizes the sum of the
values.
$offtext
*-------------------------------------------------------------- * data * slightly smaller than in blog post *-------------------------------------------------------------- set i 'rows' /r1*r30/ j 'columns' /c1*c30/
;
*-------------------------------------------------------------- * complete enumeration * only for very small problems *-------------------------------------------------------------- abort$(card(i)*card(j)>100) "Skipping complete enumeration";
No comments:
Post a Comment