#### Problem

*I am trying to generate a matrix of numbers with 7 rows and 4 columns. Each row must sum to 100 and each column must have an even spread (if permitted) between a min and max range (specified below).*

*Goal:*

```
C1 C2 C3 C4 sum range
1 low 100 ^
2 .. |
3 .. |
4 .. |
5 .. |
6 .. |
7 high _
c1_high = 98
c1_low = 75
c2_high = 15
c2_low = 6
c3_high = 8
c3_low = 2
c4_low = 0.05
c4_high =0.5
```

*In addition to this, i need the spread of each row to be as linear as possible, though a line fitted to the data with a second order polynomial would suffice (with an r^2 value of >0.98).*

Note: this description seems to hint the values in the columns must be ordered from \(L_j\) to \(U_j\). That is actually not the case: we don't assume any ordering.

#### Solution Outline

This problem somewhat resembles a**matrix balancing**problem. Let's see how we can model this.

One approach to attack this problem is:

##### Step 1: Create data with perfect spread

It is easy to generate data with perfect spread wile ignoring that the row sums are 100. We just start at \(L_j\) and increase a by fixed step size:---- 40 PARAMETER L low c1 75.000, c2 6.000, c3 2.000, c4 0.050 ---- 40 PARAMETER U high c1 98.000, c2 15.000, c3 8.000, c4 0.500 ---- 40 PARAMETER init initial matrix c1 c2 c3 c4 rowsum r1 75.000 6.000 2.000 0.050 83.050 r2 78.833 7.500 3.000 0.125 89.458 r3 82.667 9.000 4.000 0.200 95.867 r4 86.500 10.500 5.000 0.275 102.275 r5 90.333 12.000 6.000 0.350 108.683 r6 94.167 13.500 7.000 0.425 115.092 r7 98.000 15.000 8.000 0.500 121.500

This step is just data manipulation.

##### Step 2: Permute values in each column

The second step is to permute the values**within**a column to achieve a solution that has a close match to the row sums. This gives:

---- 75 VARIABLE y.L after permutation c1 c2 c3 c4 rowsum r1 86.500 10.500 5.000 0.275 102.275 r2 94.167 7.500 3.000 0.125 104.792 r3 98.000 6.000 2.000 0.050 106.050 r4 90.333 9.000 4.000 0.200 103.533 r5 82.667 12.000 6.000 0.350 101.017 r6 75.000 15.000 8.000 0.500 98.500 r7 78.833 13.500 7.000 0.425 99.758

We are closer to the row sums as expected.

One way to model a permutation of a column vector \(a\) inside an optimization model is to find a square

**permutation matrix**\(P\) [2] with

\[\begin{align}&y_i = \sum_k P_{i,k}a_k\\ &\sum_k P_{i,k}=1&\forall i \\&\sum_i P_{i,k} = 1&\forall k\\& P_{i,k}\in\{0,1\}\end{align}\]

A permutation matrix has exactly one one in each row and in each column. The rest of the elements are zero. One could say a permutation matrix is a (row or column) permuted

**identity**matrix. The constraints can also be interpreted as assignment constraints (related to the

**assignment problem**).

##### Step 3: Add minimal relative deviations

The final step is to find small deviations from \(y\) such that the row sums are exactly 100.---- 75 VARIABLE x.L final values c1 c2 c3 c4 rowsum r1 84.265 10.467 4.993 0.275 100.000 r2 89.460 7.431 2.984 0.125 100.000 r3 92.057 5.912 1.980 0.050 100.000 r4 86.863 8.949 3.988 0.200 100.000 r5 81.668 11.985 5.997 0.350 100.000 r6 76.473 15.022 8.005 0.500 100.000 r7 79.071 13.503 7.001 0.425 100.000 ---- 75 VARIABLE d.L deviations c1 c2 c3 c4 r1 -2.235 -0.033 -0.007 -2.25855E-5 r2 -4.707 -0.069 -0.016 -4.75702E-5 r3 -5.943 -0.088 -0.020 -6.00626E-5 r4 -3.471 -0.051 -0.012 -3.50779E-5 r5 -0.999 -0.015 -0.003 -1.00932E-5 r6 1.473 0.022 0.005 1.489155E-5 r7 0.237 0.003 7.931220E-4 2.399194E-6

For this I used a relative measure: minimize the sum of squared relative deviations:

\[\min \sum_{i,j} \left( \frac{d_{i,j}}{\mu_j} \right)^2 \]

where \(\mu_j = (L_j +U_j)/2\). This is to ensure that the deviations are distributed according to the average magnitude of a column. This is to make sure column 1 gets larger deviations than column 4.

#### A complete model

I believe we need to do steps 2 and 3 simultaneously to find a global optimal solution. Here is an attempt:

This is a convex MIQP model, so we can solve this with high-performance solvers like Cplex and Gurobi.

#### Notes

- We could split the model in two phases (step 2 and step 3 in the above description). I am not sure how this would impact the optimal solution. Also it would not really simplify things.
- This model can be linearized by using a different objective: minimize the sum of the
**absolute values**of the (relative) deviations:

\[\min\>\sum_{i,j}\left| \frac{d_{i,j}}{\mu_j} \right|\]

You would end up with a linear MIP model. Possible formulations are**variable splitting**

\[\begin{align}\min \> & \sum_{i,j} v_{i,j}^{\oplus} + v_{i,j}^{\ominus}\\&v_{i,j}^{\oplus} - v_{i,j}^{\ominus} = \frac{d_{i,j}}{\mu_j}\\&v_{i,j}^{\oplus}, v_{i,j}^{\ominus}\ge 0 \end{align}\]

or**bounding**:

\[\begin{align}\min\>&\sum_{i,j} v_{i,j}\\&-v_{i,j} \le \frac{d_{i,j}}{\mu_j}\le v_{i,j}\end{align}\] - Instead of dividing by the average column value \(\mu_j\) we can divide by the "real" value \(y_{i,j}\). Unfortunately, that would make this a general MINLP model. Note that we should "protect" the division, e.g. by the bound \(y_{i,j}\ge L_j\).

#### References

- Algorithm for generation of number matrix with specified boundaries, https://stackoverflow.com/questions/47754210/algorithm-for-generation-of-number-matrix-with-specified-boundaries
- Permutation Matrix, https://en.wikipedia.org/wiki/Permutation_matrix