#### Introduction

In [1] the following question was posted:

Given a (square) matrix \(A\), reorder the columns such that the sum of the diagonals is maximized. We are working in an R environment.

My answer involves permutation matrices and assignment constraints. This is a useful concept for quite a few models.

#### Permutation Matrix

When dealing with permutations it is often a good idea to look at this in terms of a

**Permutation Matrix**[2]. A permutation matrix \(P\) is an identity matrix with rows or columns permuted.

Interestingly, it does not matter whether we swap columns or rows here. The results are the same:

> n <- 5 > # identity matrix > I <- diag(n) > print(I) [,1] [,2] [,3] [,4] [,5] [1,] 1 0 0 0 0 [2,] 0 1 0 0 0 [3,] 0 0 1 0 0 [4,] 0 0 0 1 0 [5,] 0 0 0 0 1 > # swap columns 2 and 4 > P1 <- I > P1[,c(2,4)] <- I[,c(4,2)] > print(P1) [,1] [,2] [,3] [,4] [,5] [1,] 1 0 0 0 0 [2,] 0 0 0 1 0 [3,] 0 0 1 0 0 [4,] 0 1 0 0 0 [5,] 0 0 0 0 1 > # swap rows 2 and 4 > P2 <- I > P2[c(2,4),] <- I[c(4,2),] > print(P2) [,1] [,2] [,3] [,4] [,5] [1,] 1 0 0 0 0 [2,] 0 0 0 1 0 [3,] 0 0 1 0 0 [4,] 0 1 0 0 0 [5,] 0 0 0 0 1 >

A permutation matrix has the following properties:

- It is a square \(n \times n\) matrix with \(n\) entries equal to 1 and the other entries equal to 0,
- exactly one element is 1 in each row,
- and exactly one element is 1 in each column.

In an optimization model, this can be modeled as a set of

**assignment constraints**: \[\begin{align}&\sum_j p_{i,j} = 1 && \forall i \\ &\sum_i p_{i,j} =1 && \forall j \\ &p_{i,j} \in \{0,1\}\end{align}\]

#### Row and column permutations

From [2] we can see:

- Applying a
**row permutation**to an \((m\times n)\) matrix \(A\) can be viewed as a pre-multiplication with an \((m \times m)\) permutation matrix: \[\widetilde{A}=PA\] - and applying a
**column permutation**to an \((m\times n)\) matrix \(A\) can be viewed as a post-multiplication with an \((n \times n)\) permutation matrix: \[\widetilde{A}=AP\]

As seeing is believing, below is a little experiment. We start with a \(2 \times 3\) matrix \(A\) and permute some rows and columns. Note: the original question was about a square matrix, while here we use a non-square matrix to emphasize different sizes of the permutation matrices \(P_1\) and \(P_2\).

> m <- 2 > n <- 3 > # random matrix A > set.seed(123) > A <- matrix(runif(m*n,min=-10,max=+10),nrow=m,ncol=n) > print(A) [,1] [,2] [,3] [1,] -4.248450 -1.820462 8.809346 [2,] 5.766103 7.660348 -9.088870 > # swap rows 1 and 2 > P1 <- diag(m) > P1[,c(1,2)] <- diag(m)[,c(2,1)] > P1 %*% A [,1] [,2] [,3] [1,] 5.766103 7.660348 -9.088870 [2,] -4.248450 -1.820462 8.809346 > # swap columns 1 and 3 > P2 <- diag(n) > P2[,c(1,3)] <- diag(n)[,c(3,1)] > A %*% P2 [,1] [,2] [,3] [1,] 8.809346 -1.820462 -4.248450 [2,] -9.088870 7.660348 5.766103 >

Note: If we want to apply both row and column permutations, we would write: \[\widetilde{A}=P A Q\] with \(P\) and \(Q\) permutation matrices.

#### Optimization model

The original problem can now be formulated as a Mixed Integer Programming model. The mathematical model can look like:

Mixed Integer Programming Model |
---|

\[\begin{align}\max&\sum_i \color{darkred}y_{i,i} \\ & \color{darkred}y_{i,j} = \sum_k \color{darkblue}a_{i,k} \cdot \color{darkred}p_{k,j} \\ & \sum_j \color{darkred}p_{i,j} = 1 &&\forall i \\ & \sum_i \color{darkred}p_{i,j} = 1 &&\forall j \\ & \color{darkred}p_{i,j} \in \{0,1\} \\ & \color{darkred}y_{i,j} \>\mathbf{free}\end{align}\] |

#### Implementation 1: OMPR

The model above can be implemented in a straightforward manner in OMPR.

> library(ompr) > library(ompr.roi) > library(dplyr) > library(ROI) > library(ROI.plugin.symphony) > library(tidyr) > n <- 10 > set.seed(123) > a <- matrix(runif(n^2,min=-1,max=1),nrow=n,ncol=n) > print(a) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -0.42484496 0.91366669 0.77907863 0.92604847 -0.7144000 -0.9083377 0.3302304 0.50895032 -0.5127611 -0.73860862 [2,] 0.57661027 -0.09333169 0.38560681 0.80459809 -0.1709073 -0.1155999 -0.8103187 0.25844226 0.3361112 0.30620385 [3,] -0.18204616 0.35514127 0.28101363 0.38141056 -0.1725513 0.5978497 -0.2320607 0.42036480 -0.1647064 -0.31296706 [4,] 0.76603481 0.14526680 0.98853955 0.59093484 -0.2623091 -0.7562015 -0.4512327 -0.99875045 0.5763917 0.31351626 [5,] 0.88093457 -0.79415063 0.31141160 -0.95077263 -0.6951105 0.1218960 0.6292801 -0.04936685 -0.7942707 -0.35925352 [6,] -0.90888700 0.79964994 0.41706094 -0.04440806 -0.7223879 -0.5869372 -0.1029673 -0.55976223 -0.1302145 -0.62461776 [7,] 0.05621098 -0.50782453 0.08813205 0.51691908 -0.5339318 -0.7449367 0.6201287 -0.24036692 0.9699140 0.56458860 [8,] 0.78483809 -0.91588093 0.18828404 -0.56718413 -0.0680751 0.5066157 0.6247790 0.22554201 0.7861022 -0.81281003 [9,] 0.10287003 -0.34415856 -0.42168053 -0.36363798 -0.4680547 0.7900907 0.5886846 -0.29640418 0.7729381 -0.06644192 [10,] -0.08677053 0.90900730 -0.70577271 -0.53674843 0.7156554 -0.2510744 -0.1203366 -0.77772915 -0.6498947 0.02301092 > m <- MIPModel() %>% + add_variable(p[i,j], i=1:n, j=1:n, type="binary") %>% + add_variable(y[i,j], i=1:n, j=1:n) %>% + add_constraint(y[i,j] == sum_expr(a[i,k]*p[k,j],k=1:n),i=1:n,j=1:n) %>% + add_constraint(sum_expr(p[i,j],j=1:n) == 1,i=1:n) %>% + add_constraint(sum_expr(p[i,j],i=1:n) == 1,j=1:n) %>% + set_objective(sum_expr(y[i,i], i=1:n),"max") %>% + solve_model(with_ROI(solver = "symphony",verbosity=1)) Starting Preprocessing... Preprocessing finished... with no modifications... Problem has 120 constraints 200 variables 1300 nonzero coefficients Total Presolve Time: 0.000000... Solving... granularity set at 0.000000 solving root lp relaxation The LP value is: -7.422 [0,20] ****** Found Better Feasible Solution ! ****** Cost: -7.422180 **************************************************** * Optimal Solution Found * * Now displaying stats and best solution found... * **************************************************** ====================== Misc Timing ========================= Problem IO 0.000 ======================= CP Timing =========================== Cut Pool 0.000 ====================== LP/CG Timing ========================= LP Solution Time 0.000 LP Setup Time 0.000 Variable Fixing 0.000 Pricing 0.000 Strong Branching 0.000 Separation 0.000 Primal Heuristics 0.000 Communication 0.000 Total User Time 0.000 Total Wallclock Time 0.000 ====================== Statistics ========================= Number of created nodes : 1 Number of analyzed nodes: 1 Depth of tree: 0 Size of the tree: 1 Number of solutions found: 1 Number of solutions in pool: 1 Number of Chains: 1 Number of Diving Halts: 0 Number of cuts in cut pool: 0 Lower Bound in Root: -7.422 ======================= LP Solver ========================= Number of times LP solver called: 1 Number of calls from feasibility pump: 0 Number of calls from strong branching: 0 Number of solutions found by LP solve: 1 Number of bounds changed by strong branching: 0 Number of nodes pruned by strong branching: 0 Number of bounds changed by branching presolver: 0 Number of nodes pruned by branching presolver: 0 ==================== Primal Heuristics ==================== Time #Called #Solutions Rounding I 0.00 Rounding II 0.00 Diving 0.00 Feasibility Pump 0.00 Local Search 0.00 1 0 Restricted Search 0.00 Rins Search 0.00 Local Branching 0.00 =========================== Cuts ========================== Accepted: 0 Added to LPs: 0 Deleted from LPs: 0 Removed because of bad coeffs: 0 Removed because of duplicacy: 0 Insufficiently violated: 0 In root: 0 Time in cut generation: 0.00 Time in checking quality and adding: 0.00 Time #Called In Root Total Gomory 0.00 Knapsack 0.00 Clique 0.00 Probing 0.00 Flowcover 0.00 Twomir 0.00 Oddhole 0.00 Mir 0.00 Rounding 0.00 LandP-I 0.00 LandP-II 0.00 Redsplit 0.00 =========================================================== Solution Found: Node 0, Level 0 Solution Cost: -7.4221803085 > cat("Status:",solver_status(m),"\n") Status: optimal > cat("Objective:",objective_value(m),"\n") Objective: 7.42218 > df <- get_solution(m,y[i, j]) > spread(df,key=j,value=value)[,-c(1,2)] 1 2 3 4 5 6 7 8 9 10 1 0.92604847 -0.73860862 0.50895032 0.77907863 -0.42484496 0.91366669 -0.5127611 0.3302304 -0.9083377 -0.7144000 2 0.80459809 0.30620385 0.25844226 0.38560681 0.57661027 -0.09333169 0.3361112 -0.8103187 -0.1155999 -0.1709073 3 0.38141056 -0.31296706 0.42036480 0.28101363 -0.18204616 0.35514127 -0.1647064 -0.2320607 0.5978497 -0.1725513 4 0.59093484 0.31351626 -0.99875045 0.98853955 0.76603481 0.14526680 0.5763917 -0.4512327 -0.7562015 -0.2623091 5 -0.95077263 -0.35925352 -0.04936685 0.31141160 0.88093457 -0.79415063 -0.7942707 0.6292801 0.1218960 -0.6951105 6 -0.04440806 -0.62461776 -0.55976223 0.41706094 -0.90888700 0.79964994 -0.1302145 -0.1029673 -0.5869372 -0.7223879 7 0.51691908 0.56458860 -0.24036692 0.08813205 0.05621098 -0.50782453 0.9699140 0.6201287 -0.7449367 -0.5339318 8 -0.56718413 -0.81281003 0.22554201 0.18828404 0.78483809 -0.91588093 0.7861022 0.6247790 0.5066157 -0.0680751 9 -0.36363798 -0.06644192 -0.29640418 -0.42168053 0.10287003 -0.34415856 0.7729381 0.5886846 0.7900907 -0.4680547 10 -0.53674843 0.02301092 -0.77772915 -0.70577271 -0.08677053 0.90900730 -0.6498947 -0.1203366 -0.2510744 0.7156554 >

#### Implementation 2: CVXR

The model can also be expressed conveniently in CVXR:

> library(CVXR) > set.seed(123) > n <- 10 > A <- matrix(runif(n^2,min=-1,max=1),nrow=n,ncol=n) > print(A) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -0.42484496 0.91366669 0.77907863 0.92604847 -0.7144000 -0.9083377 0.3302304 0.50895032 -0.5127611 -0.73860862 [2,] 0.57661027 -0.09333169 0.38560681 0.80459809 -0.1709073 -0.1155999 -0.8103187 0.25844226 0.3361112 0.30620385 [3,] -0.18204616 0.35514127 0.28101363 0.38141056 -0.1725513 0.5978497 -0.2320607 0.42036480 -0.1647064 -0.31296706 [4,] 0.76603481 0.14526680 0.98853955 0.59093484 -0.2623091 -0.7562015 -0.4512327 -0.99875045 0.5763917 0.31351626 [5,] 0.88093457 -0.79415063 0.31141160 -0.95077263 -0.6951105 0.1218960 0.6292801 -0.04936685 -0.7942707 -0.35925352 [6,] -0.90888700 0.79964994 0.41706094 -0.04440806 -0.7223879 -0.5869372 -0.1029673 -0.55976223 -0.1302145 -0.62461776 [7,] 0.05621098 -0.50782453 0.08813205 0.51691908 -0.5339318 -0.7449367 0.6201287 -0.24036692 0.9699140 0.56458860 [8,] 0.78483809 -0.91588093 0.18828404 -0.56718413 -0.0680751 0.5066157 0.6247790 0.22554201 0.7861022 -0.81281003 [9,] 0.10287003 -0.34415856 -0.42168053 -0.36363798 -0.4680547 0.7900907 0.5886846 -0.29640418 0.7729381 -0.06644192 [10,] -0.08677053 0.90900730 -0.70577271 -0.53674843 0.7156554 -0.2510744 -0.1203366 -0.77772915 -0.6498947 0.02301092 > cat("sum diag of A:",sum(diag(A))) sum diag of A: 0.7133438 > P <- Variable(n,n,boolean=T) > Y <- Variable(n,n) > problem <- Problem(Maximize(matrix_trace(Y)), + list(Y==A %*% P, + sum_entries(P,axis=1) == 1, + sum_entries(P,axis=2) == 1)) > result <- solve(problem,verbose=T) GLPK Simplex Optimizer, v4.47 120 rows, 200 columns, 1300 non-zeros 0: obj = 0.000000000e+000 infeas = 2.000e+001 (120) * 147: obj = -8.522275744e-002 infeas = 1.743e-014 (1) * 213: obj = -7.422180308e+000 infeas = 4.854e-016 (1) OPTIMAL SOLUTION FOUND GLPK Integer Optimizer, v4.47 120 rows, 200 columns, 1300 non-zeros 100 integer variables, all of which are binary Integer optimization begins... + 213: mip = not found yet >= -inf (1; 0) + 213: >>>>> -7.422180308e+000 >= -7.422180308e+000 0.0% (1; 0) + 213: mip = -7.422180308e+000 >= tree is empty 0.0% (0; 1) INTEGER OPTIMAL SOLUTION FOUND > cat("status:",result$status) status: optimal > cat("objective:",result$value) objective: 7.42218 > print(result$getValue(Y)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0.92604847 -0.73860862 0.50895032 0.77907863 -0.42484496 0.91366669 -0.5127611 0.3302304 -0.9083377 -0.7144000 [2,] 0.80459809 0.30620385 0.25844226 0.38560681 0.57661027 -0.09333169 0.3361112 -0.8103187 -0.1155999 -0.1709073 [3,] 0.38141056 -0.31296706 0.42036480 0.28101363 -0.18204616 0.35514127 -0.1647064 -0.2320607 0.5978497 -0.1725513 [4,] 0.59093484 0.31351626 -0.99875045 0.98853955 0.76603481 0.14526680 0.5763917 -0.4512327 -0.7562015 -0.2623091 [5,] -0.95077263 -0.35925352 -0.04936685 0.31141160 0.88093457 -0.79415063 -0.7942707 0.6292801 0.1218960 -0.6951105 [6,] -0.04440806 -0.62461776 -0.55976223 0.41706094 -0.90888700 0.79964994 -0.1302145 -0.1029673 -0.5869372 -0.7223879 [7,] 0.51691908 0.56458860 -0.24036692 0.08813205 0.05621098 -0.50782453 0.9699140 0.6201287 -0.7449367 -0.5339318 [8,] -0.56718413 -0.81281003 0.22554201 0.18828404 0.78483809 -0.91588093 0.7861022 0.6247790 0.5066157 -0.0680751 [9,] -0.36363798 -0.06644192 -0.29640418 -0.42168053 0.10287003 -0.34415856 0.7729381 0.5886846 0.7900907 -0.4680547 [10,] -0.53674843 0.02301092 -0.77772915 -0.70577271 -0.08677053 0.90900730 -0.6498947 -0.1203366 -0.2510744 0.7156554 >

It helps that we can use the built-in functions

**matrix_trace**and

**sum_entries**. A CVXR model in pure matrix algebra would be more difficult to read. A model in pure matrix algebra could look like:

Model in Matrix Algebra |
---|

\[\begin{align}\max\>&\mathbf{tr}(\color{darkred}Y) \\ & \color{darkred}Y = \color{darkblue}A \color{darkred}P \\ & \color{darkred}P^T \color{darkblue}e = \color{darkblue}e \\ & \color{darkred}P\color{darkblue}e = \color{darkblue}e \\ & \color{darkred}P \in \{0,1\}^{n \times n} \\ & \color{darkred}Y \>\mathbf{free}\end{align}\] |

where \(e\) is a column of ones.

CVXR (and its Python cousin CVXPY) often works very nicely when working with 1d and 2d data structures: vectors and matrices. Unfortunately, it does not generalize to higher-dimensional problems. Many practical problems need more than 2 indices, but here we are lucky: this is a matrix problem where CVXR is at its best.

It is noted that this size (\(n=10\)) can still be handled by complete enumeration. There are \(n!=3,628,800\) column permutations.

#### Optimize the formulation

It is possible to optimize the formulation. First, we can observe that not all \(y\)'s are used. Only the diagonal elements are of interest. We can remove all off-diagonal variables. Second, we can substitute out the remaining \(y\) variables.

Mixed Integer Programming Model v2 |
---|

\[\begin{align}\max&\sum_i \sum_k \color{darkblue}a_{i,k} \cdot \color{darkred}p_{k,i} \\ & \sum_j \color{darkred}p_{i,j} = 1 &&\forall i \\ & \sum_i \color{darkred}p_{i,j} = 1 &&\forall j \\ & \color{darkred}p_{i,j} \in \{0,1\} \end{align}\] |

In practical models, I am usually not so worried about a bunch of extra continuous variables. Often I leave them in, so I can inspect their values if I need to debug the model. In the implementations above I printed out the \(y\) variables to observe how columns were permuted.

This is indeed a standard assignment problem. We can relax the integer variables (they will be integer-valued automatically) and also use specialized algorithms. Note that the solver logs from the previous model also indicate the MIP solvers can solve that problem without doing any branching.

#### References

- Maximise diagonal of matrix by permuting columns in R, https://stackoverflow.com/questions/61565176/maximise-diagonal-of-matrix-by-permuting-columns-in-r
- Permutation matrix, https://en.wikipedia.org/wiki/Permutation_matrix

The same problem formulated in MiniZinc with decision variables for the permutation of the columns, and a weight as the sum of the relevant entries in the weight matrix.

ReplyDelete```

include "globals.mzn";

int: d = 10;

set of int: D = 1..d;

array[D, D] of float: weights =

[| -0.42484496, 0.91366669, 0.77907863, 0.92604847, -0.7144000, -0.9083377, 0.3302304, 0.50895032, -0.5127611, -0.73860862

| 0.57661027, -0.09333169, 0.38560681, 0.80459809, -0.1709073, -0.1155999, -0.8103187, 0.25844226, 0.3361112, 0.30620385

| -0.18204616, 0.35514127, 0.28101363, 0.38141056, -0.1725513, 0.5978497, -0.2320607, 0.42036480, -0.1647064, -0.31296706

| 0.76603481, 0.14526680, 0.98853955, 0.59093484, -0.2623091, -0.7562015, -0.4512327, -0.99875045, 0.5763917, 0.31351626

| 0.88093457, -0.79415063, 0.31141160, -0.95077263, -0.6951105, 0.1218960, 0.6292801, -0.04936685, -0.7942707, -0.35925352

| -0.90888700, 0.79964994, 0.41706094, -0.04440806, -0.7223879, -0.5869372, -0.1029673, -0.55976223, -0.1302145, -0.62461776

| 0.05621098, -0.50782453, 0.08813205, 0.51691908, -0.5339318, -0.7449367, 0.6201287, -0.24036692, 0.9699140, 0.56458860

| 0.78483809, -0.91588093, 0.18828404, -0.56718413, -0.0680751, 0.5066157, 0.6247790, 0.22554201, 0.7861022, -0.81281003

| 0.10287003, -0.34415856, -0.42168053, -0.36363798, -0.4680547, 0.7900907, 0.5886846, -0.29640418, 0.7729381, -0.06644192

| -0.08677053, 0.90900730, -0.70577271, -0.53674843, 0.7156554, -0.2510744, -0.1203366, -0.77772915, -0.6498947, 0.02301092|];

array[D] of var D: permutation;

var float: objective;

constraint all_different(permutation) :: domain;

constraint objective = sum (i in D) (weights[i, permutation[i]]);

solve maximize objective;

```

Running using the standard minizinc solver gives the following result

```

[ ... lots of solutions ... ]

permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 7, 6, 9, 5]);

objective = 6.93707908000001;

----------

permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 7, 9, 6, 5]);

objective = 7.23371818000001;

----------

permutation = array1d(1..10, [4, 10, 8, 3, 1, 2, 9, 7, 6, 5]);

objective = 7.42218028000001;

----------

==========

%%%mzn-stat: initTime=0.002168

%%%mzn-stat: solveTime=0.0169721

%%%mzn-stat: solutions=47

%%%mzn-stat: variables=210

%%%mzn-stat: propagators=302

%%%mzn-stat: propagations=271291

%%%mzn-stat: nodes=2394

%%%mzn-stat: failures=1144

%%%mzn-stat: restarts=0

%%%mzn-stat: peakDepth=15

%%%mzn-stat-end

Finished in 178msec

```

The last model is just a linear assignment problem, so you do not need the integrality constraints and you can use specialized algorithms such as Jonker-Volgenant.

ReplyDeleteIndeed. I added a note that emphasizes this.

Delete