Yet Another Math Programming Consultant: Modeling permutations

Introduction

In [1] the following question was posted:

Given a (square) matrix \(A\), reorder the columns such that the sum of the diagonals is maximized. We are working in an R environment.

My answer involves permutation matrices and assignment constraints. This is a useful concept for quite a few models.

Permutation Matrix

When dealing with permutations it is often a good idea to look at this in terms of a Permutation Matrix [2]. A permutation matrix \(P\) is an identity matrix with rows or columns permuted.

Interestingly, it does not matter whether we swap columns or rows here. The results are the same:

> n <- 5
> # identity matrix
> I <- diag(n)  
> print(I)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    0
[5,]    0    0    0    0    1
> # swap columns 2 and 4
> P1 <- I
> P1[,c(2,4)] <- I[,c(4,2)]
> print(P1)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    0    0    1    0
[3,]    0    0    1    0    0
[4,]    0    1    0    0    0
[5,]    0    0    0    0    1
> # swap rows 2 and 4
> P2 <- I
> P2[c(2,4),] <- I[c(4,2),]
> print(P2)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    0    0    1    0
[3,]    0    0    1    0    0
[4,]    0    1    0    0    0
[5,]    0    0    0    0    1
>

A permutation matrix has the following properties:

It is a square \(n \times n\) matrix with \(n\) entries equal to 1 and the other entries equal to 0,
exactly one element is 1 in each row,
and exactly one element is 1 in each column.

In an optimization model, this can be modeled as a set of assignment constraints: \[\begin{align}&\sum_j p_{i,j} = 1 && \forall i \\ &\sum_i p_{i,j} =1 && \forall j \\ &p_{i,j} \in \{0,1\}\end{align}\]

Row and column permutations

From [2] we can see:

Applying a row permutation to an \((m\times n)\) matrix \(A\) can be viewed as a pre-multiplication with an \((m \times m)\) permutation matrix: \[\widetilde{A}=PA\]
and applying a column permutation to an \((m\times n)\) matrix \(A\) can be viewed as a post-multiplication with an \((n \times n)\) permutation matrix: \[\widetilde{A}=AP\]

As seeing is believing, below is a little experiment. We start with a \(2 \times 3\) matrix \(A\) and permute some rows and columns. Note: the original question was about a square matrix, while here we use a non-square matrix to emphasize different sizes of the permutation matrices \(P_1\) and \(P_2\).

> m <- 2
> n <- 3
> # random matrix A
> set.seed(123)
> A <- matrix(runif(m*n,min=-10,max=+10),nrow=m,ncol=n)
> print(A)
          [,1]      [,2]      [,3]
[1,] -4.248450 -1.820462  8.809346
[2,]  5.766103  7.660348 -9.088870
> # swap rows 1 and 2
> P1 <- diag(m)
> P1[,c(1,2)] <- diag(m)[,c(2,1)]
> P1 %*% A
          [,1]      [,2]      [,3]
[1,]  5.766103  7.660348 -9.088870
[2,] -4.248450 -1.820462  8.809346
> # swap columns 1 and 3
> P2 <- diag(n)
> P2[,c(1,3)] <- diag(n)[,c(3,1)]
> A %*% P2
          [,1]      [,2]      [,3]
[1,]  8.809346 -1.820462 -4.248450
[2,] -9.088870  7.660348  5.766103
>

Note: If we want to apply both row and column permutations, we would write: \[\widetilde{A}=P A Q\] with \(P\) and \(Q\) permutation matrices.

Optimization model

The original problem can now be formulated as a Mixed Integer Programming model. The mathematical model can look like:

Mixed Integer Programming Model
\[\begin{align}\max&\sum_i \color{darkred}y_{i,i} \\ & \color{darkred}y_{i,j} = \sum_k \color{darkblue}a_{i,k} \cdot \color{darkred}p_{k,j} \\ & \sum_j \color{darkred}p_{i,j} = 1 &&\forall i \\ & \sum_i \color{darkred}p_{i,j} = 1 &&\forall j \\ & \color{darkred}p_{i,j} \in \{0,1\} \\ & \color{darkred}y_{i,j} \>\mathbf{free}\end{align}\]

Implementation 1: OMPR

The model above can be implemented in a straightforward manner in OMPR.

> library(ompr)
> library(ompr.roi)
> library(dplyr)
> library(ROI)
> library(ROI.plugin.symphony)
> library(tidyr)
> n <- 10
> set.seed(123)
> a <- matrix(runif(n^2,min=-1,max=1),nrow=n,ncol=n)
> print(a)
             [,1]        [,2]        [,3]        [,4]       [,5]       [,6]       [,7]        [,8]       [,9]       [,10]
 [1,] -0.42484496  0.91366669  0.77907863  0.92604847 -0.7144000 -0.9083377  0.3302304  0.50895032 -0.5127611 -0.73860862
 [2,]  0.57661027 -0.09333169  0.38560681  0.80459809 -0.1709073 -0.1155999 -0.8103187  0.25844226  0.3361112  0.30620385
 [3,] -0.18204616  0.35514127  0.28101363  0.38141056 -0.1725513  0.5978497 -0.2320607  0.42036480 -0.1647064 -0.31296706
 [4,]  0.76603481  0.14526680  0.98853955  0.59093484 -0.2623091 -0.7562015 -0.4512327 -0.99875045  0.5763917  0.31351626
 [5,]  0.88093457 -0.79415063  0.31141160 -0.95077263 -0.6951105  0.1218960  0.6292801 -0.04936685 -0.7942707 -0.35925352
 [6,] -0.90888700  0.79964994  0.41706094 -0.04440806 -0.7223879 -0.5869372 -0.1029673 -0.55976223 -0.1302145 -0.62461776
 [7,]  0.05621098 -0.50782453  0.08813205  0.51691908 -0.5339318 -0.7449367  0.6201287 -0.24036692  0.9699140  0.56458860
 [8,]  0.78483809 -0.91588093  0.18828404 -0.56718413 -0.0680751  0.5066157  0.6247790  0.22554201  0.7861022 -0.81281003
 [9,]  0.10287003 -0.34415856 -0.42168053 -0.36363798 -0.4680547  0.7900907  0.5886846 -0.29640418  0.7729381 -0.06644192
[10,] -0.08677053  0.90900730 -0.70577271 -0.53674843  0.7156554 -0.2510744 -0.1203366 -0.77772915 -0.6498947  0.02301092
> m <- MIPModel() %>%
+   add_variable(p[i,j], i=1:n, j=1:n, type="binary") %>%
+   add_variable(y[i,j], i=1:n, j=1:n) %>%
+   add_constraint(y[i,j] == sum_expr(a[i,k]*p[k,j],k=1:n),i=1:n,j=1:n) %>%
+   add_constraint(sum_expr(p[i,j],j=1:n) == 1,i=1:n) %>%
+   add_constraint(sum_expr(p[i,j],i=1:n) == 1,j=1:n) %>%
+   set_objective(sum_expr(y[i,i], i=1:n),"max") %>%
+   solve_model(with_ROI(solver = "symphony",verbosity=1)) 
Starting Preprocessing...
Preprocessing finished...
   with no modifications...
Problem has 
  120 constraints 
  200 variables 
  1300 nonzero coefficients

Total Presolve Time: 0.000000...

Solving...

granularity set at 0.000000
solving root lp relaxation
The LP value is: -7.422 [0,20]


****** Found Better Feasible Solution !
****** Cost: -7.422180


****************************************************
* Optimal Solution Found                           *
* Now displaying stats and best solution found...  *
****************************************************

====================== Misc Timing =========================
  Problem IO        0.000
======================= CP Timing ===========================
  Cut Pool                  0.000
====================== LP/CG Timing =========================
  LP Solution Time          0.000
  LP Setup Time             0.000
  Variable Fixing           0.000
  Pricing                   0.000
  Strong Branching          0.000
  Separation                0.000
  Primal Heuristics         0.000
  Communication             0.000
  Total User Time              0.000
  Total Wallclock Time         0.000

====================== Statistics =========================
Number of created nodes :       1
Number of analyzed nodes:       1
Depth of tree:                  0
Size of the tree:               1
Number of solutions found:      1
Number of solutions in pool:    1
Number of Chains:               1
Number of Diving Halts:         0
Number of cuts in cut pool:     0
Lower Bound in Root:            -7.422

======================= LP Solver =========================
Number of times LP solver called:                 1
Number of calls from feasibility pump:            0
Number of calls from strong branching:            0
Number of solutions found by LP solve:            1
Number of bounds changed by strong branching:     0
Number of nodes pruned by strong branching:       0
Number of bounds changed by branching presolver:  0
Number of nodes pruned by branching presolver:    0

==================== Primal Heuristics ====================
                             Time      #Called   #Solutions
Rounding I                   0.00                           
Rounding II                  0.00                           
Diving                       0.00 
Feasibility Pump             0.00 
Local Search                 0.00            1            0 
Restricted Search            0.00 
Rins Search                  0.00 
Local Branching              0.00 

=========================== Cuts ==========================
Accepted:                         0
Added to LPs:                     0
Deleted from LPs:                 0
Removed because of bad coeffs:    0
Removed because of duplicacy:     0
Insufficiently violated:          0
In root:                          0

Time in cut generation:              0.00
Time in checking quality and adding: 0.00

                   Time     #Called     In Root       Total
Gomory             0.00 
Knapsack           0.00 
Clique             0.00 
Probing            0.00 
Flowcover          0.00 
Twomir             0.00 
Oddhole            0.00 
Mir                0.00 
Rounding           0.00 
LandP-I            0.00 
LandP-II           0.00 
Redsplit           0.00 

===========================================================
Solution Found: Node 0, Level 0
Solution Cost: -7.4221803085
> cat("Status:",solver_status(m),"\n")
Status: optimal 
> cat("Objective:",objective_value(m),"\n")
Objective: 7.42218 
> df <- get_solution(m,y[i, j])
> spread(df,key=j,value=value)[,-c(1,2)]
             1           2           3           4           5           6          7          8          9         10
1   0.92604847 -0.73860862  0.50895032  0.77907863 -0.42484496  0.91366669 -0.5127611  0.3302304 -0.9083377 -0.7144000
2   0.80459809  0.30620385  0.25844226  0.38560681  0.57661027 -0.09333169  0.3361112 -0.8103187 -0.1155999 -0.1709073
3   0.38141056 -0.31296706  0.42036480  0.28101363 -0.18204616  0.35514127 -0.1647064 -0.2320607  0.5978497 -0.1725513
4   0.59093484  0.31351626 -0.99875045  0.98853955  0.76603481  0.14526680  0.5763917 -0.4512327 -0.7562015 -0.2623091
5  -0.95077263 -0.35925352 -0.04936685  0.31141160  0.88093457 -0.79415063 -0.7942707  0.6292801  0.1218960 -0.6951105
6  -0.04440806 -0.62461776 -0.55976223  0.41706094 -0.90888700  0.79964994 -0.1302145 -0.1029673 -0.5869372 -0.7223879
7   0.51691908  0.56458860 -0.24036692  0.08813205  0.05621098 -0.50782453  0.9699140  0.6201287 -0.7449367 -0.5339318
8  -0.56718413 -0.81281003  0.22554201  0.18828404  0.78483809 -0.91588093  0.7861022  0.6247790  0.5066157 -0.0680751
9  -0.36363798 -0.06644192 -0.29640418 -0.42168053  0.10287003 -0.34415856  0.7729381  0.5886846  0.7900907 -0.4680547
10 -0.53674843  0.02301092 -0.77772915 -0.70577271 -0.08677053  0.90900730 -0.6498947 -0.1203366 -0.2510744  0.7156554
>

Implementation 2: CVXR

The model can also be expressed conveniently in CVXR:

> library(CVXR)
> set.seed(123)
> n <- 10
> A <- matrix(runif(n^2,min=-1,max=1),nrow=n,ncol=n)
> print(A)
             [,1]        [,2]        [,3]        [,4]       [,5]       [,6]       [,7]        [,8]       [,9]       [,10]
 [1,] -0.42484496  0.91366669  0.77907863  0.92604847 -0.7144000 -0.9083377  0.3302304  0.50895032 -0.5127611 -0.73860862
 [2,]  0.57661027 -0.09333169  0.38560681  0.80459809 -0.1709073 -0.1155999 -0.8103187  0.25844226  0.3361112  0.30620385
 [3,] -0.18204616  0.35514127  0.28101363  0.38141056 -0.1725513  0.5978497 -0.2320607  0.42036480 -0.1647064 -0.31296706
 [4,]  0.76603481  0.14526680  0.98853955  0.59093484 -0.2623091 -0.7562015 -0.4512327 -0.99875045  0.5763917  0.31351626
 [5,]  0.88093457 -0.79415063  0.31141160 -0.95077263 -0.6951105  0.1218960  0.6292801 -0.04936685 -0.7942707 -0.35925352
 [6,] -0.90888700  0.79964994  0.41706094 -0.04440806 -0.7223879 -0.5869372 -0.1029673 -0.55976223 -0.1302145 -0.62461776
 [7,]  0.05621098 -0.50782453  0.08813205  0.51691908 -0.5339318 -0.7449367  0.6201287 -0.24036692  0.9699140  0.56458860
 [8,]  0.78483809 -0.91588093  0.18828404 -0.56718413 -0.0680751  0.5066157  0.6247790  0.22554201  0.7861022 -0.81281003
 [9,]  0.10287003 -0.34415856 -0.42168053 -0.36363798 -0.4680547  0.7900907  0.5886846 -0.29640418  0.7729381 -0.06644192
[10,] -0.08677053  0.90900730 -0.70577271 -0.53674843  0.7156554 -0.2510744 -0.1203366 -0.77772915 -0.6498947  0.02301092
> cat("sum diag of A:",sum(diag(A)))
sum diag of A: 0.7133438
> P <- Variable(n,n,boolean=T)
> Y <- Variable(n,n)
> problem <- Problem(Maximize(matrix_trace(Y)),
+                    list(Y==A %*% P,
+                         sum_entries(P,axis=1) == 1,
+                         sum_entries(P,axis=2) == 1))
> result <- solve(problem,verbose=T)
GLPK Simplex Optimizer, v4.47
120 rows, 200 columns, 1300 non-zeros
      0: obj =  0.000000000e+000  infeas = 2.000e+001 (120)
*   147: obj = -8.522275744e-002  infeas = 1.743e-014 (1)
*   213: obj = -7.422180308e+000  infeas = 4.854e-016 (1)
OPTIMAL SOLUTION FOUND
GLPK Integer Optimizer, v4.47
120 rows, 200 columns, 1300 non-zeros
100 integer variables, all of which are binary
Integer optimization begins...
+   213: mip =     not found yet >=              -inf        (1; 0)
+   213: >>>>> -7.422180308e+000 >= -7.422180308e+000   0.0% (1; 0)
+   213: mip = -7.422180308e+000 >=     tree is empty   0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
> cat("status:",result$status)
status: optimal
> cat("objective:",result$value)
objective: 7.42218
> print(result$getValue(Y))
             [,1]        [,2]        [,3]        [,4]        [,5]        [,6]       [,7]       [,8]       [,9]      [,10]
 [1,]  0.92604847 -0.73860862  0.50895032  0.77907863 -0.42484496  0.91366669 -0.5127611  0.3302304 -0.9083377 -0.7144000
 [2,]  0.80459809  0.30620385  0.25844226  0.38560681  0.57661027 -0.09333169  0.3361112 -0.8103187 -0.1155999 -0.1709073
 [3,]  0.38141056 -0.31296706  0.42036480  0.28101363 -0.18204616  0.35514127 -0.1647064 -0.2320607  0.5978497 -0.1725513
 [4,]  0.59093484  0.31351626 -0.99875045  0.98853955  0.76603481  0.14526680  0.5763917 -0.4512327 -0.7562015 -0.2623091
 [5,] -0.95077263 -0.35925352 -0.04936685  0.31141160  0.88093457 -0.79415063 -0.7942707  0.6292801  0.1218960 -0.6951105
 [6,] -0.04440806 -0.62461776 -0.55976223  0.41706094 -0.90888700  0.79964994 -0.1302145 -0.1029673 -0.5869372 -0.7223879
 [7,]  0.51691908  0.56458860 -0.24036692  0.08813205  0.05621098 -0.50782453  0.9699140  0.6201287 -0.7449367 -0.5339318
 [8,] -0.56718413 -0.81281003  0.22554201  0.18828404  0.78483809 -0.91588093  0.7861022  0.6247790  0.5066157 -0.0680751
 [9,] -0.36363798 -0.06644192 -0.29640418 -0.42168053  0.10287003 -0.34415856  0.7729381  0.5886846  0.7900907 -0.4680547
[10,] -0.53674843  0.02301092 -0.77772915 -0.70577271 -0.08677053  0.90900730 -0.6498947 -0.1203366 -0.2510744  0.7156554
>

It helps that we can use the built-in functions matrix_trace and sum_entries. A CVXR model in pure matrix algebra would be more difficult to read. A model in pure matrix algebra could look like:

Model in Matrix Algebra
\[\begin{align}\max\>&\mathbf{tr}(\color{darkred}Y) \\ & \color{darkred}Y = \color{darkblue}A \color{darkred}P \\ & \color{darkred}P^T \color{darkblue}e = \color{darkblue}e \\ & \color{darkred}P\color{darkblue}e = \color{darkblue}e \\ & \color{darkred}P \in \{0,1\}^{n \times n} \\ & \color{darkred}Y \>\mathbf{free}\end{align}\]

where \(e\) is a column of ones.

CVXR (and its Python cousin CVXPY) often works very nicely when working with 1d and 2d data structures: vectors and matrices. Unfortunately, it does not generalize to higher-dimensional problems. Many practical problems need more than 2 indices, but here we are lucky: this is a matrix problem where CVXR is at its best.

It is noted that this size (\(n=10\)) can still be handled by complete enumeration. There are \(n!=3,628,800\) column permutations.

Optimize the formulation

It is possible to optimize the formulation. First, we can observe that not all \(y\)'s are used. Only the diagonal elements are of interest. We can remove all off-diagonal variables. Second, we can substitute out the remaining \(y\) variables.

This would lead to:

Mixed Integer Programming Model v2
\[\begin{align}\max&\sum_i \sum_k \color{darkblue}a_{i,k} \cdot \color{darkred}p_{k,i} \\ & \sum_j \color{darkred}p_{i,j} = 1 &&\forall i \\ & \sum_i \color{darkred}p_{i,j} = 1 &&\forall j \\ & \color{darkred}p_{i,j} \in \{0,1\} \end{align}\]

In practical models, I am usually not so worried about a bunch of extra continuous variables. Often I leave them in, so I can inspect their values if I need to debug the model. In the implementations above I printed out the \(y\) variables to observe how columns were permuted.

This is indeed a standard assignment problem. We can relax the integer variables (they will be integer-valued automatically) and also use specialized algorithms. Note that the solver logs from the previous model also indicate the MIP solvers can solve that problem without doing any branching.

References

Maximise diagonal of matrix by permuting columns in R, https://stackoverflow.com/questions/61565176/maximise-diagonal-of-matrix-by-permuting-columns-in-r
Permutation matrix, https://en.wikipedia.org/wiki/Permutation_matrix

3 comments:

MikaelMay 4, 2020 at 4:42 AM
The same problem formulated in MiniZinc with decision variables for the permutation of the columns, and a weight as the sum of the relevant entries in the weight matrix.

```
include "globals.mzn";

int: d = 10;
set of int: D = 1..d;

array[D, D] of float: weights =
[| -0.42484496, 0.91366669, 0.77907863, 0.92604847, -0.7144000, -0.9083377, 0.3302304, 0.50895032, -0.5127611, -0.73860862
| 0.57661027, -0.09333169, 0.38560681, 0.80459809, -0.1709073, -0.1155999, -0.8103187, 0.25844226, 0.3361112, 0.30620385
| -0.18204616, 0.35514127, 0.28101363, 0.38141056, -0.1725513, 0.5978497, -0.2320607, 0.42036480, -0.1647064, -0.31296706
| 0.76603481, 0.14526680, 0.98853955, 0.59093484, -0.2623091, -0.7562015, -0.4512327, -0.99875045, 0.5763917, 0.31351626
| 0.88093457, -0.79415063, 0.31141160, -0.95077263, -0.6951105, 0.1218960, 0.6292801, -0.04936685, -0.7942707, -0.35925352
| -0.90888700, 0.79964994, 0.41706094, -0.04440806, -0.7223879, -0.5869372, -0.1029673, -0.55976223, -0.1302145, -0.62461776
| 0.05621098, -0.50782453, 0.08813205, 0.51691908, -0.5339318, -0.7449367, 0.6201287, -0.24036692, 0.9699140, 0.56458860
| 0.78483809, -0.91588093, 0.18828404, -0.56718413, -0.0680751, 0.5066157, 0.6247790, 0.22554201, 0.7861022, -0.81281003
| 0.10287003, -0.34415856, -0.42168053, -0.36363798, -0.4680547, 0.7900907, 0.5886846, -0.29640418, 0.7729381, -0.06644192
| -0.08677053, 0.90900730, -0.70577271, -0.53674843, 0.7156554, -0.2510744, -0.1203366, -0.77772915, -0.6498947, 0.02301092|];

array[D] of var D: permutation;

var float: objective;

constraint all_different(permutation) :: domain;

constraint objective = sum (i in D) (weights[i, permutation[i]]);

solve maximize objective;
```

Running using the standard minizinc solver gives the following result

```
[ ... lots of solutions ... ]
permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 7, 6, 9, 5]);
objective = 6.93707908000001;
----------
permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 7, 9, 6, 5]);
objective = 7.23371818000001;
----------
permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 9, 7, 6, 5]);
objective = 7.42218028000001;
----------
==========
%%%mzn-stat: initTime=0.002168
%%%mzn-stat: solveTime=0.0169721
%%%mzn-stat: solutions=47
%%%mzn-stat: variables=210
%%%mzn-stat: propagators=302
%%%mzn-stat: propagations=271291
%%%mzn-stat: nodes=2394
%%%mzn-stat: failures=1144
%%%mzn-stat: restarts=0
%%%mzn-stat: peakDepth=15
%%%mzn-stat-end
Finished in 178msec
```
AnonymousMay 14, 2020 at 10:44 AM
The last model is just a linear assignment problem, so you do not need the integrality constraints and you can use specialized algorithms such as Jonker-Volgenant.

Sunday, May 3, 2020

Modeling permutations