Monday, February 1, 2016

Maximum Edge Biclique Problem

The standard network problems such as shortest path and min-cost flow are easily expressed as linear programming problems. Here is a network problem that is somewhat more challenging both in solving and in modeling.

We have a bipartite graph (like in the transportation problem):

image

The task is to find the largest sub-graph that is complete (i.e. all selected nodes on the left are connected to all selected nodes on the right). The size is measured as number of edges. In this small case we would have:

image

MIQP model

A first Mixed Integer Quadratic Programming model could look like:

image

This is a non-convex problem. Cplex needs a special option to solve this: OptimalityTarget 3. After that we see a few interesting things in the log file. The first are some scary messages. I assume this is not very harmfull (we find the optimal solution), but it surely looks dangerous:

*   610+  367                           40.0000      405.7801           914.45%
Found incumbent of value 40.000000 after 6.09 sec. (1843.37 ticks)
Warning:  Bounds on 'x83' contradictory.
Warning:  Bounds on 'x83' contradictory.
   1146   645       66.4227     9       40.0000      379.1979    12671  847.99%

The second interesting thing is how large the gap is at the end of the search:

Found incumbent of value 45.000000 after 54.77 sec. (15071.32 ticks)
*  8800+ 2293                           46.0000      255.3419           455.09%
Found incumbent of value 46.000000 after 55.31 sec. (15202.00 ticks)
* 10710+ 1746                           48.0000      236.1005           391.88%
Found incumbent of value 48.000000 after 66.17 sec. (17603.78 ticks)

MIQP status(101): integer optimal solution

MIP Model

A MIP model can be developed around the idea that a binary multiplication y=x1*x2 is easily linearized:

y ≥ x1+x2-1
y ≤ x1
y ≤ x2

In our case we want that a link e(i,j) exists if both x(i) and y(j) are selected. I.e. e(i,j) = x(i)*x(j). As we are maximizing the number of e(i,j)’s we can drop the first inequality in the linearization. Furthermore we can relax e(i,j) to be continuous between zero and one (it will be integer valued automatically). In the model below we do this by setting the branching priority to infinity.

Finally if there is no link between x(i) and y(j) we forbid both to be selected.

image

This model is somewhat smart in the sense that only e(i,j)’s are generated where there exist links between i and j (i.e. if a(i,j) exists).

Relaxing the e(i,j)’s does not have much of an impact on Cplex. It will reintroduce the variables as binary, as we can see in the log:

Tried aggregator 1 time.
MIP Presolve eliminated 1 rows and 1 columns.
Reduced MIP has 2263 rows, 843 columns, and 4526 nonzeros.
Reduced MIP has 80 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.00 sec. (3.08 ticks)
Probing time = 0.00 sec. (3.89 ticks)
Tried aggregator 1 time.
Reduced MIP has 2263 rows, 843 columns, and 4526 nonzeros.
Reduced MIP has 843 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.01 sec. (4.34 ticks)

For a larger problem with 100 + 100 nodes and 2000 arcs (20% density) we find the optimal solution in 25 seconds and prove optimality in 7 minutes.

Matrix interpretation

Using a display of the initial arcs and the selected ones gives another interpretation of this problem:

----     47 SET a  links

             j1          j2          j3          j4          j5          j6          j7          j8

i1          YES                                 YES         YES         YES         YES
i2          YES                                                                     YES
i3          YES         YES                     YES         YES         YES         YES         YES
i4                                  YES                                 YES         YES
i5          YES                     YES         YES                                             YES
i6          YES         YES         YES         YES         YES         YES
i7                      YES                                             YES         YES         YES
i8                                  YES                     YES         YES
i9          YES         YES                                 YES         YES         YES         YES
i10         YES                     YES                     YES         YES                     YES


----     47 VARIABLE e.L  selected links

             j1          j5          j6

i1            1           1           1
i3            1           1           1
i6            1           1           1
i9            1           1           1
i10           1           1           1

----     47 VARIABLE z.L                   =       15.000  objective

I.e. select rows and columns such that all entries are YES (and choose the selection with the maximum number of entries).

References

Some more MIP formulations can be found in:

V. Acuna, C. E. Ferreira, A. S. Freire and E. Moreno: “On the maximum edge biclique packing problem: formulations and computational experiments,” 2010.

No comments:

Post a Comment