## Tuesday, January 19, 2021

### Submatrix with Largest Sum

#### Problem

Given a data matrix $$\color{darkblue}a_{i,j}$$, find a submatrix such that the sum of the elements is maximized. The term "submatrix" can indicate a contiguous or a non-contiguous subset of rows and columns.

This is a generalization of the 1d maximum subarray problem [1].

Looks easy, but let's make sure it is.

#### Example data

Of course, the problem is trivial if all elements are non-negative: just select the whole matrix. It is only interesting if there are negative elements. So here is my random matrix:

----      8 PARAMETER a  matrix (data)

c1          c2          c3          c4          c5          c6          c7          c8          c9         c10

r1       -6.565       6.865       1.008      -3.977      -4.156      -5.519      -3.003       7.125      -8.658       0.004
r2        9.962       1.575       9.823       5.245      -7.386       2.794      -6.810      -4.998       3.379      -1.293
r3       -2.806      -2.971      -7.370      -6.998       1.782       6.618      -5.384       3.315       5.517      -3.927
r4       -7.790       0.048      -6.797       7.449      -4.698      -4.284       1.879       4.454       2.565      -0.724
r5       -1.734      -7.646      -3.716      -9.069      -3.229      -6.358       2.915       1.215       5.399      -4.044
r6        3.222       5.116       2.549      -4.323      -8.272      -7.950       2.825       0.906      -9.370       5.847
r7       -8.545      -6.487       0.513       5.004      -6.438      -9.317       1.703       2.425      -2.213      -2.826
r8       -5.139      -5.072      -7.390       8.669      -2.401       5.668      -3.999      -7.490       4.977      -8.615
r9       -5.960      -9.899      -4.608      -0.003      -6.974      -6.517      -3.387      -3.662      -3.558       9.280
r10       9.872      -2.602      -2.542       5.440      -2.066       8.262      -7.608       4.710      -8.892       1.526
r11      -8.972      -9.880      -1.975       0.398       2.578      -5.485      -2.078      -4.480      -6.953       8.726
r12      -1.547      -7.307      -2.279      -2.507      -4.630       8.967      -6.221      -4.050      -8.509      -1.973
r13      -7.966      -2.322      -3.518      -6.157      -7.753       1.931       0.229      -9.099       5.662       8.915
r14       1.929       2.147      -2.750       1.881       3.597       0.132      -6.815       3.138       0.478      -7.512
r15       9.734      -5.438       3.513       5.536       8.649      -5.975      -4.057      -6.055      -5.073       2.930
r16       4.699      -8.291      -6.993      -1.316      -6.261       3.854       5.259      -6.904      -2.212       3.909
r17       6.916       2.254       9.519      -9.462      -6.251      -8.258       0.808      -7.463       4.680      -7.735
r18      -0.233       5.912      -0.159       0.671      -9.788       0.877      -0.977       9.507      -6.323      -6.729
r19      -9.507      -6.444      -8.774      -9.667       6.713       2.033      -9.460      -6.078       9.014      -3.289
r20       1.885      -4.816       2.813      -6.895      -0.800      -2.133       6.109       0.820      -2.186       1.156
r21       8.655      -3.025      -9.834       8.977       1.438      -3.327       9.675       5.329      -7.798       9.896
r22       1.606      -6.672       2.867      -3.114       8.247       8.001      -9.675      -2.627       3.288       1.868
r23      -9.309       6.836       8.642       0.159      -4.008      -0.068      -9.101       5.474       0.659       4.935
r24       4.401       2.632      -7.702       9.423       4.135       9.725       7.096       2.429       4.026       4.018
r25       5.814       2.204      -8.914      -0.296      -8.949       3.972      -6.104      -5.479       6.273       9.835
r26       5.013       4.367      -9.988      -4.723       6.476       6.391       7.208      -5.746      -0.864      -9.233
r27      -3.540      -1.202      -3.693      -7.305       6.219      -1.664      -7.164      -0.689      -4.340       7.914
r28      -8.712      -1.708      -3.168      -0.634       2.853       2.872      -3.248      -7.984       8.117      -5.653
r29       8.377      -0.965      -8.201      -2.516      -1.700      -1.916      -7.767       5.023       6.068      -9.527
r30      -0.382      -4.428       8.032      -9.648       3.621       9.018       8.004       7.976       7.489      -2.180


The values are drawn from a uniform distribution, $$\color{darkblue}a_{i,j} \sim U(-10,10)$$.

#### Non-contiguous submatrix

The non-convex MIQP formulation is easy. Just introduce binary variables that indicate if a row or column is selected. A cell is selected if both its corresponding row and column are selected.

MIQP model A
\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_i \cdot \color{darkred}y_j\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\} \end{align}

If we pass this on to Cplex, it solves it very fast. The results are:

----     28 VARIABLE z.L                   =      163.081  objective

----     28 VARIABLE x.L  selected rows

r2  1.000,    r3  1.000,    r8  1.000,    r10 1.000,    r14 1.000,    r15 1.000,    r21 1.000,    r22 1.000,    r24 1.000
r25 1.000,    r26 1.000,    r28 1.000,    r29 1.000,    r30 1.000

----     28 VARIABLE y.L  selected columns

c1 1.000,    c4 1.000,    c5 1.000,    c6 1.000,    c9 1.000

----     32 PARAMETER sel  selected cells

c1          c4          c5          c6          c9

r2        9.962       5.245      -7.386       2.794       3.379
r3       -2.806      -6.998       1.782       6.618       5.517
r8       -5.139       8.669      -2.401       5.668       4.977
r10       9.872       5.440      -2.066       8.262      -8.892
r14       1.929       1.881       3.597       0.132       0.478
r15       9.734       5.536       8.649      -5.975      -5.073
r21       8.655       8.977       1.438      -3.327      -7.798
r22       1.606      -3.114       8.247       8.001       3.288
r24       4.401       9.423       4.135       9.725       4.026
r25       5.814      -0.296      -8.949       3.972       6.273
r26       5.013      -4.723       6.476       6.391      -0.864
r28      -8.712      -0.634       2.853       2.872       8.117
r29       8.377      -2.516      -1.700      -1.916       6.068
r30      -0.382      -9.648       3.621       9.018       7.489


This is a non-convex problem, but Cplex solves it very fast: just 982 simplex iterations and 0 branch-and-bound nodes. The reason is: Cplex will linearize this problem. Of course, we can also do this linearization ourselves:

MIP model B
\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}s_{i,j}\\ & \color{darkred}s_{i,j} \le \color{darkred}x_i && \forall i,j \\ &\color{darkred}s_{i,j} \le \color{darkred}y_j && \forall i,j & \\ & \color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1 && \forall i,j\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}s_{i,j} \in [0,1] \end{align}

Notes:
• The problem is also known as the finding the maximum edge biclique in bipartite graphs.
• This model solves in 577 Simplex iterations and 0 nodes. This is even a bit faster than the automatically reformulated version. Sometimes we can beat Cplex! (Note that is just luck: Cplex is doing something like Model C which is most of the time better).
• The variables $$\color{darkred}s_{i,j}$$ can be declared to be binary, or they can be relaxed to be continuous between 0 and 1.
• If $$\color{darkred}s_{i,j}$$ is relaxed, Cplex will make them binary again. We can see this in the log: "Reduced MIP has 40 binaries, 0 generals, 0 SOSs, and 0 indicators" is followed by "Reduced MIP has 340 binaries, 0 generals, 0 SOSs, and 0 indicators".
• In many models, we can drop either the $$\le$$ constraints or the $$\ge$$ constraint of the linearization. Here we need them both.
• Well, that is not completely true. We can drop $$\color{darkred}s_{i,j} \le \color{darkred}x_i$$ and $$\color{darkred}s_{i,j} \le \color{darkred}y_i$$ if $$\color{darkblue}a_{i,j}\le 0$$. And we can drop $$\color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1$$ if $$\color{darkblue}a_{i,j}\ge 0$$. Indeed, we can drop all constraints if $$\color{darkblue}a_{i,j}= 0$$. If we apply this to our model, we reduce the number of constraints from 900 to 433. Interestingly the number of Simplex iterations goes up to 862 (from 577). The number of nodes stays 0.

MIP model C
\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}s_{i,j}\\ & \color{darkred}s_{i,j} \le \color{darkred}x_i && \forall i,j| \color{darkblue}a_{i,j}\gt 0 \\ &\color{darkred}s_{i,j} \le \color{darkred}y_j && \forall i,j| \color{darkblue}a_{i,j}\gt 0 & \\ & \color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1 && \forall i,j| \color{darkblue}a_{i,j}\lt 0\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}s_{i,j} \in [0,1] \end{align}

A different linearization is proposed in [4]. First row sums are calculated: $\color{darkred}r_i = \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}y_j$ Then the objective $\max \> \sum_i \color{darkred}r_i \cdot \color{darkred} x_i$ is linearized. This is done by observing that only positive row sums $$\color{darkred}r_i$$ are ever selected. So the objective is rewritten as $\max \> \sum_i \color{darkred}r_i^+$ where $$\color{darkred}r_i^+ = \max(0,\color{darkred}r_i)$$. Linearization of this expression leads to the model:

MIP model D
\begin{align} \max& \sum_{i} \color{darkred} r_i^+ \\ & \color{darkred} r_i^+ \le \color{darkred}x_i \cdot \color{darkblue}M_{1,i} \\ & \color{darkred} r_i^+ \le \color{darkred} r_i + (1-\color{darkred}x_i)\cdot \color{darkblue}M_{2,i}\\ & \color{darkred}r_i = \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}y_j \\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}r_i^+ \ge 0\\ & \color{darkred}r_i \>\textrm{free} \end{align}

The first two constraints implement the implications: \begin{align} & \color{darkred}x_i=0 \Rightarrow \color{darkred} r_i^+ = 0 \\ & \color{darkred}x_i=1 \Rightarrow \color{darkred} r_i^+ = \color{darkred} r_i\end{align} Good values for the big-M constants are given in [4]: \begin{align} &\color{darkblue}M_{1,i} = \max(0,\color{darkblue}a_{i,j}) \\ &\color{darkblue}M_{2,i} = -\min(0,\color{darkblue}a_{i,j})\end{align} The advantage of this model is that we need only vectors of decision variables. There is no variable matrix (in the other models we have $$s_{i,j}$$, either directly or indirectly). A disadvantage is that the model is a bit more complex and seems somewhat slower to solve.

#### Contiguous submatrix

If we want to select an optimal contiguous submatrix, we can use a modeling trick. This is borrowed from machine scheduling where we sometimes want to limit the number of start-ups. An example can be that manufacturers require such limitations to reduce wear and tear on equipment like generators. For both $$\color{darkred}x_i, \color{darkred}y_j$$ we require that they can go from 0 to 1 just once. For this, we introduce binary variables: \begin{aligned} &\color{darkred}p_i = \begin{cases} 1 & \text{if row i is the start of the selected submatrix}\\ 0 & \text{otherwise}\end{cases} \\ & \color{darkred}q_j = \begin{cases} 1 & \text{if column j is the start of the selected submatrix}\\ 0 & \text{otherwise}\end{cases}\end{aligned} The implications $$\color{darkred}x_{i-1}=0\textbf{ and }\color{darkred}x_{i}=1 \Rightarrow \color{darkred}p_i = 1$$ can be implemented as $$\color{darkred}p_i \ge \color{darkred}x_{i} - \color{darkred}x_{i-1}$$. We can only have one $$\color{darkred}p_i =1$$. Similar for other direction: $$\color{darkred}q_j$$. So the model can look like:

MIQP model E
\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_i \cdot \color{darkred}y_j\\ & \color{darkred}p_i \ge \color{darkred}x_{i} - \color{darkred}x_{i-1} \\ & \color{darkred}q_j \ge \color{darkred}y_{j} - \color{darkred}y_{i-1} \\ & \sum_i \color{darkred}p_i \le 1 \\ & \sum_j \color{darkred}q_j \le 1\\ &\color{darkred}x_i, \color{darkred}y_j,\color{darkred}p_i, \color{darkred}q_j \in \{0,1\} \end{align}

Here we assume $$\color{darkred}x_0 = \color{darkred}y_0= 0$$. The results are:

----     38 VARIABLE z.L                   =       81.721  objective

----     38 VARIABLE x.L  selected rows

r20 1.000,    r21 1.000,    r22 1.000,    r23 1.000,    r24 1.000,    r25 1.000,    r26 1.000,    r27 1.000
r28 1.000,    r29 1.000,    r30 1.000

----     38 VARIABLE p.L  start of submatrix

r20 1.000

----     38 VARIABLE y.L  selected columns

c5  1.000,    c6  1.000,    c7  1.000,    c8  1.000,    c9  1.000,    c10 1.000

----     38 VARIABLE q.L  start of submatrix

c5 1.000

----     42 PARAMETER sel  selected cells

c5          c6          c7          c8          c9         c10

r20      -0.800      -2.133       6.109       0.820      -2.186       1.156
r21       1.438      -3.327       9.675       5.329      -7.798       9.896
r22       8.247       8.001      -9.675      -2.627       3.288       1.868
r23      -4.008      -0.068      -9.101       5.474       0.659       4.935
r24       4.135       9.725       7.096       2.429       4.026       4.018
r25      -8.949       3.972      -6.104      -5.479       6.273       9.835
r26       6.476       6.391       7.208      -5.746      -0.864      -9.233
r27       6.219      -1.664      -7.164      -0.689      -4.340       7.914
r28       2.853       2.872      -3.248      -7.984       8.117      -5.653
r29      -1.700      -1.916      -7.767       5.023       6.068      -9.527
r30       3.621       9.018       8.004       7.976       7.489      -2.180


Notes:
• This model is still solved completely during preprocessing: zero nodes were needed.
• We can relax $$\color{darkblue}p$$ and $$\color{darkblue}q$$ to be continuous between 0 and 1.
• Making $$\sum_i \color{darkred}p_i\le 1$$ an equality $$\sum_i \color{darkred}p_i= 1$$ does not really do much: we can still have zero selected rows, due to how the bounding works.
• For small data sets, it is easy to enumerate all possible submatrices. For our data set, we have 25,575 of them (excluding the special case of no selected rows and columns).
• We can use the same linearization techniques that were discussed before.

These models seem to solve very quickly. However, if you think we can solve big problems with these formulations, you are wrong. Even slightly larger problems show much larger solution times.

#### Comparison

Here we solve some larger problems. Our models are:

1. Non-contiguous MIQP model A. Let Cplex linearize.
2. Non-contiguous MIP model B. Linearized while keeping all constraints.
3. Non-contiguous MIP model C. Linearize with dropping unnecessary constraints.
4. Non-contiguous MIP model D. Alternative formulation from [4].
5. Contiguous MIQP model E. Let Cplex linearize.

Model AModel BModel CModel DModel EModel E
Problem non-contiguousnon-contiguousnon-contiguousnon-contiguouscontiguouscontiguous
Problem size (rows/columns) 50/30 50/30 50/30 50/30 50/30 100/100
Constraints/variables (generated) 0/80 4,500/1,580 2,237/1,580 150/180 82/160 202/400
Constraints/variables (after presolve) 2,237/1,580 4,500/1,580 2,237/1,580 100/130 2,317/1,658 15,198/10,398
Objective 784.904 784.904 784.904 784.904298.447 826.110
Gap optimal optimal optimal 32% optimal optimal
Time (seconds) 1,113 1,072 376 > 3600 3 951
Nodes 199,525 77,967 5,273 3,034,448 0 8,174
Simplex iterations 32,497,491 17,956,170 5,569,196 104,396,873 2,519 3,654,174

Notes:
• The contiguous problem, as expected, is much easier to solve.
• As always: these MIP models find good and even optimal solutions relatively quickly. Most of the time is spent proving optimality.
• For the contiguous problem, there are well-known dynamic programming algorithms [2,3]. Not sure if there is one for the non-contiguous case.
• For more on the non-contiguous case, see [4].
• The above timings are on a slow laptop, so you can expect better performance on a more beefy machine.
• Model D performs much better on the data sets shown in [4]. My random data makes the problem very difficult: combinatorially challenging. Practical data sets may show natural "hills" and "valleys", which can make the problem much easier.
• It is quite possible that formulation D works well when the problem is very large and easy. It does not need a matrix of decision variables, which may be beneficial in that case.

#### Conclusions

Using random data, we can easily generate some challenging data sets.

Solvers like Cplex and Gurobi may automatically reformulate non-convex MIQP models into linear models. I like that: we can use quadratic forms to simplify models and not worry about tedious reformulations that move us away from more natural forms of expressions of the problem. In a sense, quadratic terms look like indicator constraints. The underlying solver engine may not even know how to deal with this, but automatically applied reformulations make it look it does. In theory, solvers should be in a better position to apply the most optimal reformulations for a model than a modeler.

In a subsequent post, I will discuss a generalization of this problem. Instead of matrix elements, we consider points on the 2d plane, each having a value. Find the rectangle such that the sum of the values of the points inside the rectangle is maximized.

#### References

1. Maximum subarray problem, https://en.wikipedia.org/wiki/Maximum_subarray_problem
2. Maximum sum rectangle in a 2d matrix, https://www.geeksforgeeks.org/maximum-sum-rectangle-in-a-2d-matrix-dp-27/
3. Maximum Sum Rectangle In A 2D Matrix - Kadane's Algorithm Applications (Dynamic Programming), https://www.youtube.com/watch?v=-FgseNO-6Gk
4. Vincent Branders, Pierre Schaus, and Pierre Dupont, Mining a Sub-Matrix of Maximal Sumhttps://arxiv.org/abs/1709.08461
5. Continuous max sum rectangle: MIQP vs GA, https://yetanothermathprogrammingconsultant.blogspot.com/2021/01/continuous-max-sum-rectangle-miqp-vs-ga.html. This is a follow-up post discussing a variant of this problem.