Performing symbolic math steps is often related to pattern recognition. In theory, ChatGPT could be doing a good job here. I wanted to find the inverse of \[f(x) = {\mathrm{sign}}(x) \log(1+|x|)\] This function is a form of a signed logarithmic scaling. So, let's see what ChatGPT is telling us:

I am a full-time consultant and provide services related to the design, implementation and deployment of mathematical programming, optimization and data-science applications. I also teach courses and workshops. Usually I cannot blog about projects I am doing, but there are many technical notes I'd like to share. Not in the least so I have an easy way to search and find them again myself. You can reach me at erwin@amsterdamoptimization.com.

## Wednesday, September 27, 2023

## Wednesday, September 20, 2023

### Julia vs Python

Simulation with n=1000 |

## Monday, September 4, 2023

### Critiquing a GAMS Model

It is always interesting to read GAMS models written by someone else. There are probably three things one can observe:

- A nice formulation or concept that is useful to learn about.
- A bad implementation: something that really should not be done that way.
- A piece of code that is correct and defensible, but I would write it differently. This includes things like style, layout, formatting, etc.

**sambal.gms**in the GAMS model library [1]. It is a very small model, but I have many thoughts about it. The complete model is reproduced in Appendix 1. Let's walk through it.

**matrix balancing problem**is to find a nearby matrix such that row- and column sums are obeyed. A relative quadratic objective is used to minimize the sum of the squared deviations between the original data (the priors) and the final matrix. Zeros in the matrix need to be maintained: they can't become nonzero. This is sometimes called sparsity preservation. Often, sign-preservation is another condition. That is not part of this model. Note that, in this model, not only the matrix is updated but also the row and column totals.

## Tuesday, August 29, 2023

### Three-level Matrix Balancing

### Matrix balancing: introduction

Matrix Balancing Models are often used in economic modeling exercises: they create consistent data sets from data originating from different, conflicting data sources. A standard example is updating a matrix subject to given row and column sums. An example can look like:

Update orange cells subject to row/column sums |

The empty cells are zero, and they should remain zero. In other words, we need to **preserve sparsity**. Often, we have non-negativity restrictions on the values. The mathematical model can look like this:

## Friday, August 4, 2023

### Some TSP MTZ experiments

In [1], a question was posed about a TSP model using the MTZ (Miller-Tucker-Zemlin) subtour elimination constraints. The results with Julia/glpk were disappointing. With \(n=58\) cities, things were taken so long that the solver seemed to hang. Here I want to see how a precise formulation with a good MIP solver can do better. As *seeing is believing*, let's do some experiments.

The standard MTZ formulation[1] can be derived easily. We use the binary variables \[\color{darkred}x_{i,j}=\begin{cases}1 & \text{if city $j$ is visited directly after going to city $i$}\\ 0 & \text{otherwise}\end{cases}\]

## Wednesday, July 12, 2023

### Coloring edges of a grid

Given an m * n grid, each edge must be colored. However, there are 2 constraints:

- Entire grid must use only 3 unique colors
- Each square in the grid must be colored using exactly 2 colors (2 edges per color)

## Thursday, July 6, 2023

### A Julia thingy

In Julia, we can write 2x instead of 2*x. Not the most earth-shattering. But a bit special nonetheless.

The expression in the last cell is interpreted as a function call.

## Sunday, July 2, 2023

### Some confusion here

Sometimes you end up visiting strange sites. This is a question & answer site. The question is:

Obviously, this is not a good question. It is something like "

*what is the difference between a ham sandwich and butter?*".

The answers are not so good either.

## Tuesday, June 27, 2023

### CVXPY DCP errors

Objective | CVXPY code | Result | Notes |
---|---|---|---|

\[\color{darkred}x^T \color{darkred}x\] | x.T@x | DCP error | print shows minimize x@x, i.e. transpose is dropped |

\[\color{darkred}x^T \color{darkred}x\] | x@x | DCP error | |

\[\color{darkred}x^T \color{darkred}x\] | cp.sum_squares(x) | transformed into quad_over_lin(x, 1.0) | |

\[\color{darkred}x^T \color{darkblue}Q \color{darkred}x\] | x.T@Q@x | transformed into QuadForm(x,Q) | |

\[\color{darkred}y:=\color{darkred}x-\color{darkblue}p\]\[\color{darkred}x^T \color{darkblue}Q \color{darkred}y\] | y=x-p x.T@Q@y | DCP error | |

\[\color{darkred}x^T \color{darkblue}Q \color{darkred}x - \color{darkred}x^T \color{darkblue}Q \color{darkblue}p\] | x.T@Q@x - x.T@Q@p | first term transformed into QuadForm(x,Q) |

## Wednesday, May 10, 2023

### Generate all solutions that sum up to one

In a post, the following question was posed:

We can select unique values \(\displaystyle\frac{1}{i}\) for \(i=1,\dots,n\). Find all combinations that add up to 1.

A complete enumeration scheme was slow even for \(n=10\). Can we use a MIP model for this or something related?

A single solution is easily found using the model:

Mathematical Model |
---|

\[ \begin{align} & \sum_{i=1}^n \frac{1}{i} \cdot \color{darkred}x_i = 1 \\ & \color{darkred}x_i \in \{0,1\} \end{align}\] |

## Sunday, May 7, 2023

### Finding common patterns

In [1], the following problem is stated:

Given a boolean matrix, with \(m\) rows and \(n\) columns, find the largest pattern of ones that is found in at least \(\color{darkblue}K\) rows. We can ignore cells where the pattern has a zero value: they don't count.

A small example [1] is given:

row 1:[0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1] row 2:[0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1] row 3:[0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1] row 4:[1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1] row 5:[1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1] row 6:[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1]

With \(K=3\), we can form a pattern with 10 nonzero elements:

row 1: [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1] row 2: [0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1] row 3: [0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1] row 4: [1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1] row 5: [1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1] row 6: [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1] pattern:[1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1]

## Tuesday, May 2, 2023

### Solving as network with lowerbounds

In [1], we looked at the following problem:

Mathematical Model |
---|

\[ \begin{align} \min& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_{i,j} \\ & \sum_j \color{darkblue}a_{i,j}\cdot \color{darkred}x_{i,j} \ge \color{darkblue}r_i && \forall i \\ & \sum_i \color{darkblue}a_{i,j}\cdot \color{darkred}x_{i,j} \ge \color{darkblue}c_j && \forall j \\ & \color{darkred}x_{i,j} \in \{0,1\} \end{align}\] |

## Wednesday, April 26, 2023

### A large MIP model that should be solved as LP: the root node

In this post, I want to discuss an observation about the root node when solving a MIP model.

### Problem Description

We have a large matrix \(\color{darkblue}A=\color{darkblue}a_{i,j}\) with values 0 and 1. In addition, we have a minimum on the row and column totals. These are called \(\color{darkblue}r_i\) and \(\color{darkblue}c_j\). The goal is to remove as many 1's in the matrix \(\color{darkblue}A\) subject to these minimum row and column totals.

## Tuesday, April 18, 2023

### Some GAMS embedded Python notes

Here are two issues you may want to be aware of. The discussion below is relevant for Windows and not so much for Unix variants.

### Using Python Raw strings for directories

$onEmbeddedCode Python: dir = r"%system.fp%" $offEmbeddedCode |

## Wednesday, April 5, 2023

### In-process, in-memory databases

There are a few database systems that are a bit different. They are libraries that can be linked directly to your application. Linking can be done statically (during the compilation/linking step) or dynamically (using a shared library or DLL). Here I want to show two cases:

- SQLite [1] used from R on data frames
- DuckDB [2] used from Python, again on data frames

## Tuesday, April 4, 2023

### Compression and large tables (Excel and CSV)

Here is an interesting little experiment: load a large CSV file into Excel. My original Powerpoint slide from a recent presentation was not as complete as it should be. Here is more size info:

## Saturday, March 25, 2023

### Simultaneous equation models and data errors

## Thursday, March 16, 2023

### Algorithm vs. model

From [1]:

We are given a plane defined by Ax+By+Cz-D = 0 where D is significantly larger than A,B,C and GCD(A,B,C) = 1. How would I find all points (x, y, z), where x,y,z are integers and >= 0, that lie on the plane in an efficient manner?

So the poster asks for an algorithm to find \(\color{darkred}x,\color{darkred}y,\color{darkred}z \in \{0,1,2,\dots\}\) such that \[\color{darkblue}A \cdot \color{darkred}x + \color{darkblue}B \cdot \color{darkred}y + \color{darkblue}C \cdot \color{darkred}z = \color{darkblue}D\] Besides the assumptions stated in the question, I'll further assume \(\color{darkblue}A,\color{darkblue}B,\color{darkblue}C,\color{darkblue}D\gt 0\).

## Tuesday, March 14, 2023

### Choosing between NLP solvers: interior point or active set.

One way to categorize (local) nonlinear programming (NLP) solvers is **active set methods** and **interior point solvers**. Some representative large-scale sparse solvers are:

- Active set: CONOPT, SNOPT. These are using SQP algorithms.
- Interior point: IPOPT, Knitro. Note: Knitro also contains an active set algorithm.

## Monday, March 6, 2023

### Some approaches for moving data between MS Access and GAMS

Moving data between different environments is always more difficult than we hope. Here I list some approaches and actually try them out on a small dataset. We hit some bugs along the way and also a few conceptual stumbling blocks (even for this stylized example). We had some issues with Access as well as GAMS Connect.

This question came up in an actual project. My idea was: "Let me show you how this can be done". I am afraid, I got carried away a bit. But it demonstrates that we should not underestimate these, at first sight, menial tasks. When the data set becomes larger, the problems compound. We can't eyeball the data, and statistically, it is more likely we encounter some problems.

## Friday, February 24, 2023

### Another fast MIP model: covering

In [1], the following problem is stated:

- There is a collection of \(n=1,000\) test questions.
- Each question covers a number of skills.
- Given is a requirement for a number of questions for each required skill (e.g., 4 questions about skill 1, 3 questions about skill 2, etc.).
- Create a test with the minimum number of questions that fulfills the requirements.

## Thursday, February 16, 2023

### Assigning jobs to machines without overlap

Here we consider the following problem from [1]:

- We have jobs with a given start time and completion time
- Jobs can be repeated on given days (e.g. job 1 needs to run on Monday, Wednesday, and Friday)
- We want to assign jobs to machines in such a way that there is no overlap
- The objective is to minimize the number of machines needed to execute all jobs

## Wednesday, February 15, 2023

### Supplier selection: an easy MIP

- We want to order items in different quantities from suppliers.
- Suppliers have an available inventory for these items. This can be zero.
- We can split the ordering over different suppliers.
- The cost structure is as follows:
- Shipping cost is a fixed cost per supplier.
- Item cost is a variable per-unit cost.

## Monday, February 13, 2023

### Populating SQLite databases

GAMS has three easy ways to populate a SQLite database:

- Using the tool
**gdx2sqlite**. This tool populates a SQLite database with data from a GDX file. This means we first have to export GAMS data to a GDX file. As there is quite some file I/O going on here (writing GDX file, reading GDX file, writing database), I would expect this to be slower than the next method. - The new GAMS-
**connect**facility. This does not use intermediate files, and directly copies records from in-memory data. This should be the fastest. - Old fashioned
**CSV**files. We first export data as a GDX file, and then use**gdxdump**to convert the data to a CSV file. Then sqlite can import the CSV file, and populate the database. There is much file I/O here, so this should be slow.

## Saturday, January 28, 2023

### Tiny non-convex quadratic model brings solvers to their knees

Here is a very small geometric problem:

Given \(n\) points in 2d space, find the smallest triangle that contains all these points.

Find the smallest triangle containing all points. |

This looks like a simple problem. Somewhat to my surprise, my attempt here does not bear that out.

## Tuesday, January 24, 2023

### Export GAMS GDX file to different Python formats (CSV,Feather,Pickle)

**GAMS user**can run the GAMS script, without knowing Python or having Python installed. (GAMS comes with its own somewhat barebones Python, which is used inside the GAMS script). On the other hand, the

**Python user**will not need to have GAMS installed to read the generated data files. This is opposed to Python packages and tools that can interact with GDX files directly. You can see a few in the PyPI directory [1]. Those will require both a GAMS system and a Python environment.

## Wednesday, January 11, 2023

### MIP Bounds

**best possible integer solution**at time \(t\). Together with the

**best-found integer solution**so far, we have two bounds. Somewhere in between is the final optimal solution. The typical picture for these two bounds is as follows:

## Monday, January 2, 2023

### High Level MIP Modeling

Most LP/MIP modeling tools stay close to what is the underpinning of a Mixed-Integer Programming problem:

**a system of linear equations**(equalities and inequalities) plus a linear objective. Examples are Pulp and GAMS. In Constraint Programming (CP), modeling tools need to provide access to higher-level

**global constraints**. Without these global constraints (such as the famous

**all-different**constraint), CP solvers would not perform very well.