## Sunday, May 3, 2020

### Modeling permutations

#### Introduction

In [1] the following question was posted:

Given a (square) matrix $$A$$, reorder the columns such that the sum of the diagonals is maximized. We are working in an R environment.

My answer involves permutation matrices and assignment constraints. This is a useful concept for quite a few models.

#### Permutation Matrix

When dealing with permutations it is often a good idea to look at this in terms of a Permutation Matrix [2]. A permutation matrix $$P$$ is an identity matrix with rows or columns permuted.

Interestingly, it does not matter whether we swap columns or rows here. The results are the same:

> n <- 5
> # identity matrix
> I <- diag(n)
> print(I)
[,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    0
[5,]    0    0    0    0    1
> # swap columns 2 and 4
> P1 <- I
> P1[,c(2,4)] <- I[,c(4,2)]
> print(P1)
[,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    0    0    1    0
[3,]    0    0    1    0    0
[4,]    0    1    0    0    0
[5,]    0    0    0    0    1
> # swap rows 2 and 4
> P2 <- I
> P2[c(2,4),] <- I[c(4,2),]
> print(P2)
[,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    0    0    1    0
[3,]    0    0    1    0    0
[4,]    0    1    0    0    0
[5,]    0    0    0    0    1
>


A permutation matrix has the following properties:

1. It is a square $$n \times n$$ matrix with $$n$$ entries equal to 1 and the other entries equal to 0,
2. exactly one element is 1 in each row,
3. and exactly one element is 1 in each column.

In an optimization model, this can be modeled as a set of assignment constraints: \begin{align}&\sum_j p_{i,j} = 1 && \forall i \\ &\sum_i p_{i,j} =1 && \forall j \\ &p_{i,j} \in \{0,1\}\end{align}

#### Row and column permutations

From [2] we can see:

• Applying a row permutation to an $$(m\times n)$$  matrix $$A$$ can be viewed as a pre-multiplication with an $$(m \times m)$$ permutation matrix: $\widetilde{A}=PA$
• and applying a column permutation to an $$(m\times n)$$  matrix $$A$$ can be viewed as a post-multiplication with an $$(n \times n)$$ permutation matrix: $\widetilde{A}=AP$
As seeing is believing, below is a little experiment. We start with a $$2 \times 3$$ matrix $$A$$ and permute some rows and columns. Note: the original question was about a square matrix, while here we use a non-square matrix to emphasize different sizes of the permutation matrices $$P_1$$ and $$P_2$$.

> m <- 2
> n <- 3
> # random matrix A
> set.seed(123)
> A <- matrix(runif(m*n,min=-10,max=+10),nrow=m,ncol=n)
> print(A)
[,1]      [,2]      [,3]
[1,] -4.248450 -1.820462  8.809346
[2,]  5.766103  7.660348 -9.088870
> # swap rows 1 and 2
> P1 <- diag(m)
> P1[,c(1,2)] <- diag(m)[,c(2,1)]
> P1 %*% A
[,1]      [,2]      [,3]
[1,]  5.766103  7.660348 -9.088870
[2,] -4.248450 -1.820462  8.809346
> # swap columns 1 and 3
> P2 <- diag(n)
> P2[,c(1,3)] <- diag(n)[,c(3,1)]
> A %*% P2
[,1]      [,2]      [,3]
[1,]  8.809346 -1.820462 -4.248450
[2,] -9.088870  7.660348  5.766103
>


Note: If we want to apply both row and column permutations, we would write: $\widetilde{A}=P A Q$ with $$P$$ and $$Q$$ permutation matrices.

#### Optimization model

The original problem can now be formulated as a Mixed Integer Programming model. The mathematical model can look like:

Mixed Integer Programming Model
\begin{align}\max&\sum_i \color{darkred}y_{i,i} \\ & \color{darkred}y_{i,j} = \sum_k \color{darkblue}a_{i,k} \cdot \color{darkred}p_{k,j} \\ & \sum_j \color{darkred}p_{i,j} = 1 &&\forall i \\ & \sum_i \color{darkred}p_{i,j} = 1 &&\forall j \\ & \color{darkred}p_{i,j} \in \{0,1\} \\ & \color{darkred}y_{i,j} \>\mathbf{free}\end{align}

#### Implementation 1: OMPR

The model above can be implemented in a straightforward manner in OMPR.

> library(ompr)
> library(ompr.roi)
> library(dplyr)
> library(ROI)
> library(ROI.plugin.symphony)
> library(tidyr)
> n <- 10
> set.seed(123)
> a <- matrix(runif(n^2,min=-1,max=1),nrow=n,ncol=n)
> print(a)
[,1]        [,2]        [,3]        [,4]       [,5]       [,6]       [,7]        [,8]       [,9]       [,10]
[1,] -0.42484496  0.91366669  0.77907863  0.92604847 -0.7144000 -0.9083377  0.3302304  0.50895032 -0.5127611 -0.73860862
[2,]  0.57661027 -0.09333169  0.38560681  0.80459809 -0.1709073 -0.1155999 -0.8103187  0.25844226  0.3361112  0.30620385
[3,] -0.18204616  0.35514127  0.28101363  0.38141056 -0.1725513  0.5978497 -0.2320607  0.42036480 -0.1647064 -0.31296706
[4,]  0.76603481  0.14526680  0.98853955  0.59093484 -0.2623091 -0.7562015 -0.4512327 -0.99875045  0.5763917  0.31351626
[5,]  0.88093457 -0.79415063  0.31141160 -0.95077263 -0.6951105  0.1218960  0.6292801 -0.04936685 -0.7942707 -0.35925352
[6,] -0.90888700  0.79964994  0.41706094 -0.04440806 -0.7223879 -0.5869372 -0.1029673 -0.55976223 -0.1302145 -0.62461776
[7,]  0.05621098 -0.50782453  0.08813205  0.51691908 -0.5339318 -0.7449367  0.6201287 -0.24036692  0.9699140  0.56458860
[8,]  0.78483809 -0.91588093  0.18828404 -0.56718413 -0.0680751  0.5066157  0.6247790  0.22554201  0.7861022 -0.81281003
[9,]  0.10287003 -0.34415856 -0.42168053 -0.36363798 -0.4680547  0.7900907  0.5886846 -0.29640418  0.7729381 -0.06644192
[10,] -0.08677053  0.90900730 -0.70577271 -0.53674843  0.7156554 -0.2510744 -0.1203366 -0.77772915 -0.6498947  0.02301092
> m <- MIPModel() %>%
+   add_variable(p[i,j], i=1:n, j=1:n, type="binary") %>%
+   add_variable(y[i,j], i=1:n, j=1:n) %>%
+   add_constraint(y[i,j] == sum_expr(a[i,k]*p[k,j],k=1:n),i=1:n,j=1:n) %>%
+   add_constraint(sum_expr(p[i,j],j=1:n) == 1,i=1:n) %>%
+   add_constraint(sum_expr(p[i,j],i=1:n) == 1,j=1:n) %>%
+   set_objective(sum_expr(y[i,i], i=1:n),"max") %>%
+   solve_model(with_ROI(solver = "symphony",verbosity=1))
Starting Preprocessing...
Preprocessing finished...
with no modifications...
Problem has
120 constraints
200 variables
1300 nonzero coefficients

Total Presolve Time: 0.000000...

Solving...

granularity set at 0.000000
solving root lp relaxation
The LP value is: -7.422 [0,20]

****** Found Better Feasible Solution !
****** Cost: -7.422180

****************************************************
* Optimal Solution Found                           *
* Now displaying stats and best solution found...  *
****************************************************

====================== Misc Timing =========================
Problem IO        0.000
======================= CP Timing ===========================
Cut Pool                  0.000
====================== LP/CG Timing =========================
LP Solution Time          0.000
LP Setup Time             0.000
Variable Fixing           0.000
Pricing                   0.000
Strong Branching          0.000
Separation                0.000
Primal Heuristics         0.000
Communication             0.000
Total User Time              0.000
Total Wallclock Time         0.000

====================== Statistics =========================
Number of created nodes :       1
Number of analyzed nodes:       1
Depth of tree:                  0
Size of the tree:               1
Number of solutions found:      1
Number of solutions in pool:    1
Number of Chains:               1
Number of Diving Halts:         0
Number of cuts in cut pool:     0
Lower Bound in Root:            -7.422

======================= LP Solver =========================
Number of times LP solver called:                 1
Number of calls from feasibility pump:            0
Number of calls from strong branching:            0
Number of solutions found by LP solve:            1
Number of bounds changed by strong branching:     0
Number of nodes pruned by strong branching:       0
Number of bounds changed by branching presolver:  0
Number of nodes pruned by branching presolver:    0

==================== Primal Heuristics ====================
Time      #Called   #Solutions
Rounding I                   0.00
Rounding II                  0.00
Diving                       0.00
Feasibility Pump             0.00
Local Search                 0.00            1            0
Restricted Search            0.00
Rins Search                  0.00
Local Branching              0.00

=========================== Cuts ==========================
Accepted:                         0
Added to LPs:                     0
Deleted from LPs:                 0
Removed because of bad coeffs:    0
Removed because of duplicacy:     0
Insufficiently violated:          0
In root:                          0

Time in cut generation:              0.00
Time in checking quality and adding: 0.00

Time     #Called     In Root       Total
Gomory             0.00
Knapsack           0.00
Clique             0.00
Probing            0.00
Flowcover          0.00
Twomir             0.00
Oddhole            0.00
Mir                0.00
Rounding           0.00
LandP-I            0.00
LandP-II           0.00
Redsplit           0.00

===========================================================
Solution Found: Node 0, Level 0
Solution Cost: -7.4221803085
> cat("Status:",solver_status(m),"\n")
Status: optimal
> cat("Objective:",objective_value(m),"\n")
Objective: 7.42218
> df <- get_solution(m,y[i, j])
1           2           3           4           5           6          7          8          9         10
1   0.92604847 -0.73860862  0.50895032  0.77907863 -0.42484496  0.91366669 -0.5127611  0.3302304 -0.9083377 -0.7144000
2   0.80459809  0.30620385  0.25844226  0.38560681  0.57661027 -0.09333169  0.3361112 -0.8103187 -0.1155999 -0.1709073
3   0.38141056 -0.31296706  0.42036480  0.28101363 -0.18204616  0.35514127 -0.1647064 -0.2320607  0.5978497 -0.1725513
4   0.59093484  0.31351626 -0.99875045  0.98853955  0.76603481  0.14526680  0.5763917 -0.4512327 -0.7562015 -0.2623091
5  -0.95077263 -0.35925352 -0.04936685  0.31141160  0.88093457 -0.79415063 -0.7942707  0.6292801  0.1218960 -0.6951105
6  -0.04440806 -0.62461776 -0.55976223  0.41706094 -0.90888700  0.79964994 -0.1302145 -0.1029673 -0.5869372 -0.7223879
7   0.51691908  0.56458860 -0.24036692  0.08813205  0.05621098 -0.50782453  0.9699140  0.6201287 -0.7449367 -0.5339318
8  -0.56718413 -0.81281003  0.22554201  0.18828404  0.78483809 -0.91588093  0.7861022  0.6247790  0.5066157 -0.0680751
9  -0.36363798 -0.06644192 -0.29640418 -0.42168053  0.10287003 -0.34415856  0.7729381  0.5886846  0.7900907 -0.4680547
10 -0.53674843  0.02301092 -0.77772915 -0.70577271 -0.08677053  0.90900730 -0.6498947 -0.1203366 -0.2510744  0.7156554
>


#### Implementation 2: CVXR

The model can also be expressed conveniently in CVXR:

> library(CVXR)
> set.seed(123)
> n <- 10
> A <- matrix(runif(n^2,min=-1,max=1),nrow=n,ncol=n)
> print(A)
[,1]        [,2]        [,3]        [,4]       [,5]       [,6]       [,7]        [,8]       [,9]       [,10]
[1,] -0.42484496  0.91366669  0.77907863  0.92604847 -0.7144000 -0.9083377  0.3302304  0.50895032 -0.5127611 -0.73860862
[2,]  0.57661027 -0.09333169  0.38560681  0.80459809 -0.1709073 -0.1155999 -0.8103187  0.25844226  0.3361112  0.30620385
[3,] -0.18204616  0.35514127  0.28101363  0.38141056 -0.1725513  0.5978497 -0.2320607  0.42036480 -0.1647064 -0.31296706
[4,]  0.76603481  0.14526680  0.98853955  0.59093484 -0.2623091 -0.7562015 -0.4512327 -0.99875045  0.5763917  0.31351626
[5,]  0.88093457 -0.79415063  0.31141160 -0.95077263 -0.6951105  0.1218960  0.6292801 -0.04936685 -0.7942707 -0.35925352
[6,] -0.90888700  0.79964994  0.41706094 -0.04440806 -0.7223879 -0.5869372 -0.1029673 -0.55976223 -0.1302145 -0.62461776
[7,]  0.05621098 -0.50782453  0.08813205  0.51691908 -0.5339318 -0.7449367  0.6201287 -0.24036692  0.9699140  0.56458860
[8,]  0.78483809 -0.91588093  0.18828404 -0.56718413 -0.0680751  0.5066157  0.6247790  0.22554201  0.7861022 -0.81281003
[9,]  0.10287003 -0.34415856 -0.42168053 -0.36363798 -0.4680547  0.7900907  0.5886846 -0.29640418  0.7729381 -0.06644192
[10,] -0.08677053  0.90900730 -0.70577271 -0.53674843  0.7156554 -0.2510744 -0.1203366 -0.77772915 -0.6498947  0.02301092
> cat("sum diag of A:",sum(diag(A)))
sum diag of A: 0.7133438
> P <- Variable(n,n,boolean=T)
> Y <- Variable(n,n)
> problem <- Problem(Maximize(matrix_trace(Y)),
+                    list(Y==A %*% P,
+                         sum_entries(P,axis=1) == 1,
+                         sum_entries(P,axis=2) == 1))
> result <- solve(problem,verbose=T)
GLPK Simplex Optimizer, v4.47
120 rows, 200 columns, 1300 non-zeros
0: obj =  0.000000000e+000  infeas = 2.000e+001 (120)
*   147: obj = -8.522275744e-002  infeas = 1.743e-014 (1)
*   213: obj = -7.422180308e+000  infeas = 4.854e-016 (1)
OPTIMAL SOLUTION FOUND
GLPK Integer Optimizer, v4.47
120 rows, 200 columns, 1300 non-zeros
100 integer variables, all of which are binary
Integer optimization begins...
+   213: mip =     not found yet >=              -inf        (1; 0)
+   213: >>>>> -7.422180308e+000 >= -7.422180308e+000   0.0% (1; 0)
+   213: mip = -7.422180308e+000 >=     tree is empty   0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
> cat("status:",result$status) status: optimal > cat("objective:",result$value)
objective: 7.42218
> print(result\$getValue(Y))
[,1]        [,2]        [,3]        [,4]        [,5]        [,6]       [,7]       [,8]       [,9]      [,10]
[1,]  0.92604847 -0.73860862  0.50895032  0.77907863 -0.42484496  0.91366669 -0.5127611  0.3302304 -0.9083377 -0.7144000
[2,]  0.80459809  0.30620385  0.25844226  0.38560681  0.57661027 -0.09333169  0.3361112 -0.8103187 -0.1155999 -0.1709073
[3,]  0.38141056 -0.31296706  0.42036480  0.28101363 -0.18204616  0.35514127 -0.1647064 -0.2320607  0.5978497 -0.1725513
[4,]  0.59093484  0.31351626 -0.99875045  0.98853955  0.76603481  0.14526680  0.5763917 -0.4512327 -0.7562015 -0.2623091
[5,] -0.95077263 -0.35925352 -0.04936685  0.31141160  0.88093457 -0.79415063 -0.7942707  0.6292801  0.1218960 -0.6951105
[6,] -0.04440806 -0.62461776 -0.55976223  0.41706094 -0.90888700  0.79964994 -0.1302145 -0.1029673 -0.5869372 -0.7223879
[7,]  0.51691908  0.56458860 -0.24036692  0.08813205  0.05621098 -0.50782453  0.9699140  0.6201287 -0.7449367 -0.5339318
[8,] -0.56718413 -0.81281003  0.22554201  0.18828404  0.78483809 -0.91588093  0.7861022  0.6247790  0.5066157 -0.0680751
[9,] -0.36363798 -0.06644192 -0.29640418 -0.42168053  0.10287003 -0.34415856  0.7729381  0.5886846  0.7900907 -0.4680547
[10,] -0.53674843  0.02301092 -0.77772915 -0.70577271 -0.08677053  0.90900730 -0.6498947 -0.1203366 -0.2510744  0.7156554
>


It helps that we can use the built-in functions matrix_trace and sum_entries. A CVXR model in pure matrix algebra would be more difficult to read. A model in pure matrix algebra could look like:

Model in Matrix Algebra
\begin{align}\max\>&\mathbf{tr}(\color{darkred}Y) \\ & \color{darkred}Y = \color{darkblue}A \color{darkred}P \\ & \color{darkred}P^T \color{darkblue}e = \color{darkblue}e \\ & \color{darkred}P\color{darkblue}e = \color{darkblue}e \\ & \color{darkred}P \in \{0,1\}^{n \times n} \\ & \color{darkred}Y \>\mathbf{free}\end{align}

where $$e$$ is a column of ones.

CVXR (and its Python cousin CVXPY) often works very nicely when working with 1d and 2d data structures: vectors and matrices. Unfortunately, it does not generalize to higher-dimensional problems. Many practical problems need more than 2 indices, but here we are lucky: this is a matrix problem where CVXR is at its best.

It is noted that this size ($$n=10$$) can still be handled by complete enumeration. There are $$n!=3,628,800$$ column permutations.

#### Optimize the formulation

It is possible to optimize the formulation. First, we can observe that not all $$y$$'s are used. Only the diagonal elements are of interest. We can remove all off-diagonal variables. Second, we can substitute out the remaining $$y$$ variables.
This would lead to:

Mixed Integer Programming Model v2
\begin{align}\max&\sum_i \sum_k \color{darkblue}a_{i,k} \cdot \color{darkred}p_{k,i} \\ & \sum_j \color{darkred}p_{i,j} = 1 &&\forall i \\ & \sum_i \color{darkred}p_{i,j} = 1 &&\forall j \\ & \color{darkred}p_{i,j} \in \{0,1\} \end{align}

In practical models, I am usually not so worried about a bunch of extra continuous variables. Often I leave them in, so I can inspect their values if I need to debug the model. In the implementations above I printed out the $$y$$ variables to observe how columns were permuted.

This is indeed a standard assignment problem. We can relax the integer variables (they will be integer-valued automatically) and also use specialized algorithms. Note that the solver logs from the previous model also indicate the MIP solvers can solve that problem without doing any branching.

#### References

1. Maximise diagonal of matrix by permuting columns in R, https://stackoverflow.com/questions/61565176/maximise-diagonal-of-matrix-by-permuting-columns-in-r
2. Permutation matrix, https://en.wikipedia.org/wiki/Permutation_matrix

1. The same problem formulated in MiniZinc with decision variables for the permutation of the columns, and a weight as the sum of the relevant entries in the weight matrix.


include "globals.mzn";

int: d = 10;
set of int: D = 1..d;

array[D, D] of float: weights =
[| -0.42484496, 0.91366669, 0.77907863, 0.92604847, -0.7144000, -0.9083377, 0.3302304, 0.50895032, -0.5127611, -0.73860862
| 0.57661027, -0.09333169, 0.38560681, 0.80459809, -0.1709073, -0.1155999, -0.8103187, 0.25844226, 0.3361112, 0.30620385
| -0.18204616, 0.35514127, 0.28101363, 0.38141056, -0.1725513, 0.5978497, -0.2320607, 0.42036480, -0.1647064, -0.31296706
| 0.76603481, 0.14526680, 0.98853955, 0.59093484, -0.2623091, -0.7562015, -0.4512327, -0.99875045, 0.5763917, 0.31351626
| 0.88093457, -0.79415063, 0.31141160, -0.95077263, -0.6951105, 0.1218960, 0.6292801, -0.04936685, -0.7942707, -0.35925352
| -0.90888700, 0.79964994, 0.41706094, -0.04440806, -0.7223879, -0.5869372, -0.1029673, -0.55976223, -0.1302145, -0.62461776
| 0.05621098, -0.50782453, 0.08813205, 0.51691908, -0.5339318, -0.7449367, 0.6201287, -0.24036692, 0.9699140, 0.56458860
| 0.78483809, -0.91588093, 0.18828404, -0.56718413, -0.0680751, 0.5066157, 0.6247790, 0.22554201, 0.7861022, -0.81281003
| 0.10287003, -0.34415856, -0.42168053, -0.36363798, -0.4680547, 0.7900907, 0.5886846, -0.29640418, 0.7729381, -0.06644192
| -0.08677053, 0.90900730, -0.70577271, -0.53674843, 0.7156554, -0.2510744, -0.1203366, -0.77772915, -0.6498947, 0.02301092|];

array[D] of var D: permutation;

var float: objective;

constraint all_different(permutation) :: domain;

constraint objective = sum (i in D) (weights[i, permutation[i]]);

solve maximize objective;


Running using the standard minizinc solver gives the following result


[ ... lots of solutions ... ]
permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 7, 6, 9, 5]);
objective = 6.93707908000001;
----------
permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 7, 9, 6, 5]);
objective = 7.23371818000001;
----------
permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 9, 7, 6, 5]);
objective = 7.42218028000001;
----------
==========
%%%mzn-stat: initTime=0.002168
%%%mzn-stat: solveTime=0.0169721
%%%mzn-stat: solutions=47
%%%mzn-stat: variables=210
%%%mzn-stat: propagators=302
%%%mzn-stat: propagations=271291
%%%mzn-stat: nodes=2394
%%%mzn-stat: failures=1144
%%%mzn-stat: restarts=0
%%%mzn-stat: peakDepth=15
%%%mzn-stat-end
Finished in 178msec


2. The last model is just a linear assignment problem, so you do not need the integrality constraints and you can use specialized algorithms such as Jonker-Volgenant.

1. Indeed. I added a note that emphasizes this.