Yet Another Math Programming Consultant: August 2022

Thursday, August 25, 2022

Some Matrix Balancing Experiments

This is about the matrix balancing problem.

We have three sets of data:

A matrix with with entries \(\color{darkblue}A^0_{i,j}\ge 0\).
Row- and column-totals \(\color{darkblue}r_i\) and \(\color{darkblue}c_j\).

The \(\color{darkblue}A^0\) matrix is collected from different sources than the row- and column-totals. So the matrix elements don't sum up to our totals. The problem is finding a nearby matrix \(\color{darkred}A\), so the row and column totals are obeyed. In addition, we want to preserve the sparsity pattern of \(\color{darkblue}A^0\): zeros should stay zero. And also: we don't want to introduce negative numbers (preserve the signs). More formally:

Matrix Balancing Problem
\[\begin{align}\min\>&{\bf{dist}}(\color{darkred}A,\color{darkblue}A^0)\\ & \sum_i \color{darkred}A_{i,j} = \color{darkblue}c_j && \forall j\\ & \sum_j \color{darkred}A_{i,j} = \color{darkblue}r_i && \forall i \\&\color{darkred}A_{i,j}=0 &&\forall i,j\|\color{darkblue}A^0_{i,j}=0\\ &\color{darkred}A_{i,j}\ge 0 \end{align} \]

Approximate the matrix subject to row- and column-sum constraints

Wordle: number of words from unique letters

In [1] an optimization model is proposed for the following problem:

Using a list of 5-letter words, try to find a set of 5 words such that each letter is used at most once.

Data

The list of 5-letter words is from the wordle [2] game. Words with duplicate letters are removed. A partial list is:

----   1035 SET w  words: large list of 5-letter words

abdom,    abend,    abets,    abhor,    abide,    abies,    abyes,    abilo,    abime,    abysm,    abled,    abler,    ables
ablet,    ablow,    abmho,    abner,    abnet,    abode,    abody,    abohm,    aboil,    abord,    abort,    abote,    about
above,    abret,    abrim,    abrin,    abris,    abrus,    absey,    absit,    abstr,    abune,    abuse,    abush,    abuts
acedy,    acerb,    ached,    achen,    acher,    aches,    achor,    acidy,    acids,    acier,    acies,    acyls,    acing
ackey,    acker,    aclys,    acmes,    acned,    acnes,    acoin,    acold,    acone,    acorn,    acost,    acoup,    acred
acres,    acrid,    acryl,    acron,    acrux,    acted,    actin,    acton,    actor,    actos,    actus,    acute,    adcon

The total number of 5-letter words in this list is: 10,175.

Primal, Dual and Equilibrium format of the Transportation Model

The primal transportation model [1] can be stated as:

Primal formulation
\[\begin{align}\min& \sum_{i,j} \color{darkblue}{\mathit{cost}}_{i,j}\cdot\color{darkred}x_{i,j} \\ & \sum_j \color{darkred}x_{i,j}\le \color{darkblue}{\mathit{supply}}_{i}\perp \color{darkred}u_i \le 0 && \forall i \\ & \sum_i \color{darkred}x_{i,j}\ge \color{darkblue}{\mathit{demand}}_{j}\perp \color{darkred}v_j \ge 0 && \forall j \\ & \color{darkred}x_{i,j}\ge 0\end{align} \]

Here \(\perp\) indicates "with dual ...". The duals for this model are \(\color{darkred}u_i\) and \(\color{darkred}v_j\). In the model, we have added the (optimal) signs of the duals. It may come as a surprise that the optimality of a solution can be established by just looking at the signs of the marginals (duals and reduced cost). When we print the results of this model, we can see something like:

k out of n with smallest bandwidth

The problem from [1], slightly generalized, is easy to describe:

Given an \(n\) vector \(\color{darkblue}v_i\) with data, select \(k\lt n\) elements that are closest to each other.

Of course, we can state the problem as a formal optimization model:

Optimization Model
\[\begin{align}\min\>&\color{darkred}U-\color{darkred}L \\ & \color{darkred}\delta_i=1 \Rightarrow \color{darkblue}v_i \in [\color{darkred}L,\color{darkred}U] \\ & \sum_i \color{darkred}\delta_i=\color{darkblue}k \\ & \color{darkred}\delta_i \in \{0,1\} \end{align}\]

Large sparse transportation model with CVXPY,CVXR

In [1] I was trying out different formulations of a large, sparse (but very easy) transportation model using different modeling tools and solvers. The conclusion was:

Exploiting sparsity leads to the best performance
GAMS/Cplex is about 10 times as fast as Python/PuLP/CBC
GAMS/Cplex is about 1000 times as fast as R/OMPR/GLPK

Don't overinterpret these numbers: this is a single model, which may not be at all representable for your models. Let's see how CVXPY[2] and CVXR[3] are doing. To recap we want to solve a large (but easy) transportation model:

Dense Transportation Model
\[\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\cdot\color{darkred}x_{i,j}\\ & \sum_j \color{darkred}x_{i,j} \le \color{darkblue}a_i && \forall i \\& \sum_i \color{darkred}x_{i,j} \ge \color{darkblue}b_j && \forall j \\ & \color{darkred}x_{i,j} \ge 0 \end{align}\]

The additional wrinkle is that only a small fraction of the links \(i \rightarrow j\) exist.

There are (at least) three ways to model this:

Use a large cost coefficient for forbidden links.
Fix \(\color{darkred}x_{i,j}=0\) for non-existing links.
Exploit sparsity directly by only summing over existing links.

CVXPY/CVXR are matrix-based modeling tools. That gives some extra things to think about. A transportation model in matrix-notation can look like:

Transportation Model in Matrix Notation
\[\begin{align}\min\>&{\bf tr}(\color{darkblue}C^T\color{darkred}X)\\&\color{darkred}X\color{darkblue}e \le \color{darkblue}a \\ & \color{darkred}X^t\color{darkblue}e \ge \color{darkblue}b \\ & \color{darkred}X \ge 0 \end{align}\]

Curl.exe

Newer versions of Windows come with a very useful tool to download files: curl.exe.

The standard help is:

C:\Users\erwin>where curl

C:\Windows\System32\curl.exe

C:\Users\erwin>curl --help

Usage: curl [options...] <url>

-d, --data <data> HTTP POST data

-f, --fail Fail fast with no output on HTTP errors

-h, --help <category> Get help for commands

-i, --include Include protocol response headers in the output

-o, --output <file> Write to file instead of stdout

-O, --remote-name Write output to a file named as the remote file

-s, --silent Silent mode

-T, --upload-file <file> Transfer local FILE to destination

-u, --user <user:password> Server user and password

-A, --user-agent <name> Send User-Agent <name> to server

-v, --verbose Make the operation more talkative

-V, --version Show version number and quit

This is not the full help, this menu is stripped into categories.

Use "--help category" to get an overview of all categories.

For all options use the manual or "--help all".

C:\Users\erwin>

A model with semi-continuous variables

In [1], a selection problem is posted, that turns out to be an easy MIP. The problem is:

From a (large) collection of products select those products with the highest unit return (or value) subject to:

A product can only be ordered between a minimum and maximum amount
There is a budget w.r.t. the total amount of products

This can be modeled using so-called semi-continuous variables: variables that can assume zero or a value between the constants \(\color{darkblue}L\) and \(\color{darkblue}U\). So we can write:

MIP model with semi-continuous variables
\[\begin{align} \max & \sum_i \color{darkred}x_i \cdot \color{darkblue}{\mathit{rate}}_i \\ & \sum_i \color{darkred}x_i \le \color{darkblue}{\mathit{budget}} \\ & \color{darkred}x_i \in \{0\}\cup [\color{darkblue}L_i,\color{darkblue}U_i] \end{align} \]

Yet Another Math Programming Consultant

Thursday, August 25, 2022

Some Matrix Balancing Experiments

Tuesday, August 23, 2022

Wordle: number of words from unique letters

Data

Tuesday, August 16, 2022

Primal, Dual and Equilibrium format of the Transportation Model

Saturday, August 13, 2022

k out of n with smallest bandwidth

Wednesday, August 10, 2022

Large sparse transportation model with CVXPY,CVXR

Monday, August 1, 2022

Curl.exe

A model with semi-continuous variables