Yet Another Math Programming Consultant: 2024

Thursday, October 17, 2024

Equity in optimization models

In optimization models, we often use an aggregate measure in the objective function, such as total profit, the sum of tardiness of jobs, and countrywide GDP. This can lead to particularly bad results for some individuals or groups.

Here is an example I have used on several occasions.

Problem Statement

We have $P$ persons. They must be assigned to $M$ groups or teams. For simplicity, we can assume $n$ is a multiple of $m$, and the group size is \[\frac{N}{M}\] Each person $p_1$ specifies some preferences to be placed in the same group as a person $p_2$. A negative preference can be used to indicate that I prefer to be in a different group. Find an optimal assignment taking into account these preferences.

GAMS 48 tests

Some minor quibbles.

gdx2sqlite

The latest version of GAMS contains a replacement of gdx2sqlite. This dumps a GDX file into a SQLite database. It is a tool I use a lot. Here is a comparison using the indus89 model in the GAMS model library:

Prevent Loops in GAMS

This book [1] on DEA models has an accompanying website with all the GAMS models [2].

Of course, I'll be doing some nitpicking on the GAMS code.

CSV readers mutilating my data

R and CSV files

When I deal with regional codes such as FIPS[1] and HUC[2], CSV file readers often mutilate my regions. Here is an example in R:

Solving DEA Models with GAMS

Data Envelopment Analysis (DEA) models are somewhat special. They typically consist of small LPs, of which a whole bunch have to be solved. The CCR formulation (after [1]), for the $i$-th DMU (Decision Making Unit), can be stated as [2]:

CCR LP Model
\[\begin{align} \max \>& \color{darkred}{\mathit{efficiency}}_i=\sum_{\mathit{outp}} \color{darkred}u_{{\mathit{outp}}} \cdot \color{darkblue}y_{i,{\mathit{outp}}} \\ & \sum_{\mathit{inp}} \color{darkred}v_{{\mathit{inp}}} \cdot \color{darkblue}x_{i,{\mathit{inp}}} = 1 \\ & \sum_{\mathit{outp}} \color{darkred}u_{{\mathit{outp}}} \cdot \color{darkblue}y_{j,{\mathit{outp}}} \le \color{darkred}v_{{\mathit{inp}}} \cdot \color{darkblue}x_{j,{\mathit{inp}}} && \forall j \\ & \color{darkred}u_{{\mathit{outp}}} \ge 0, \color{darkred}v_{{\mathit{inp}}} \ge 0 \end{align}\]

CCR LP Model

\[\begin{align} \max \>& \color{darkred}{\mathit{efficiency}}_i=\sum_{\mathit{outp}} \color{darkred}u_{{\mathit{outp}}} \cdot \color{darkblue}y_{i,{\mathit{outp}}} \\ & \sum_{\mathit{inp}} \color{darkred}v_{{\mathit{inp}}} \cdot \color{darkblue}x_{i,{\mathit{inp}}} = 1 \\ & \sum_{\mathit{outp}} \color{darkred}u_{{\mathit{outp}}} \cdot \color{darkblue}y_{j,{\mathit{outp}}} \le \color{darkred}v_{{\mathit{inp}}} \cdot \color{darkblue}x_{j,{\mathit{inp}}} && \forall j \\ & \color{darkred}u_{{\mathit{outp}}} \ge 0, \color{darkred}v_{{\mathit{inp}}} \ge 0 \end{align}\]

Multiple Solutions in Minimum Spanning Tree example

In [1], I discussed some LP and MIP formulations for the Minimum Spanning Tree (MST) problem.

Minimum Spanning Tree visualized through Google Maps

Here, I focus on two formulations: a multicommodity network approach (this can be solved as a large LP) and a MIP formulation based on techniques we know from the Traveling Salesman Problem (TSP). The main issue I want to discuss is the presence of multiple optimal solutions.

N-queens and solution pool

In [1], I described some chess-related problems. Here, I want to reproduce the $n$-queens problem. The single solution problem, placing as many queens on the chess board as possible so they don't attack each other, is pretty standard. I want to focus on the more complex question: How many different ways can we place those queens? In other words: what are all the optimal solutions? We can do this by adding a no-good constraint that forbids the previously found solution. However, as this problem has more than a handful of different solutions, I want to use the Cplex solution pool.

Single Solution Model

We define the decision variables as: \[\color{darkred}x_{i,j} = \begin{cases} 1 & \text{if we place a queen on the square $(i,j)$} \\ 0 & \text{otherwise}\end{cases}\]

Chess Board

Circle Packing and HTML reporting

Little example. Here, we try to pack $n$ circles with a given radius $r_i$ into a larger disc with an unknown radius $R$. The goal is to minimize $R$. The underlying model is simple:

Packing of Circles
\[\begin{align} \min\> & \color{darkred}R \\ & \sum_c \left(\color{darkred}p_{i,c}-\color{darkred}p_{j,c}\right)^2 \ge \left(\color{darkblue}r_i+\color{darkblue}r_j\right)^2 & \forall i\lt j \\ & \sum_c \color{darkred}p_{i,c}^2 \le \left(\color{darkred}R-\color{darkblue}r_i\right)^2 & \forall i \\ & \color{darkred}R \ge 0\\ & c \in \{x,y\} \\ \end{align}\]

Revised Simplex LP Solver written in GAMS

I am teaching some GAMS classes, and a question arose: "How does the Simplex method work?" It's not easy to answer in a few sentences, but I want to touch upon the concept of a basis anyway. Once you have a good intuition of what a basis is, a simple Simplex method is not so far-fetched. I find the tableau presentation somewhat confusing and far removed from what actual Simplex solvers do. I strongly prefer the Revised Simplex Method in matrix notation.

Minor rant: I just don't understand the appeal of the tableau method. It looks to me like an invention for torturing undergrad students. Most of all, it is not very structure-revealing; it does not help you understand the underlying concepts. But about 100% of the LP textbooks insist we should learn that first.

As a gimmick, I implemented a simplified version in the GAMS language. This reminds me that someone spent the effort writing a Basic interpreter in TeX [1]. This is probably just as useful.

Practical Large-Scale Modeling: Sparsity

Presentation at USDA-ERS Model Summit 2024.

Sunday, June 30, 2024

Inflation is a difficult concept for many

Last friday, 6/28, new PCE (Personal Consumption Expenditures Price Index) data were released. The year-on-year inflation numbers decreased from 2.7% last month to 2.6% [1]:

Let's see how the popular press reports this [2]:

Another very small but very difficult global NLP model

The goal of this exercise is to fill a square area $[0,250]\times[0,100]$ with 25 circles. The model can choose the $x$ and $y$ coordinates of the center of each circle and the radius. So we have as variables $\color{darkred}x_i$, $\color{darkred}y_i$, and $\color{darkred}r_i$. The circles placed inside the area should not overlap. The objective is to maximize the total area covered.

A solution is:

Modeling surprises

Here is an example where the PuLP modeling tool goes berserk.

In standard linear programming, only $\ge$, $=$ and $\le$ constraints are supported. Some tools also allow $\ne$, which for MIP models needs to be reformulated into a disjunctive constraint. Here is an attempt to do this in PuLP [1]. PuLP does not support this relational operator in its constraints, so we would expect a meaningful error message.

Rounding inside an optimization model

In [1], the question was asked: how can I round to two decimal places inside an optimization model? I.e., \[\color{darkred}y_{i,j} = \mathbf{round}(\color{darkred}x_{i,j},2)\] To get this off my chest first: I have never encountered a situation like this. Rounding to two decimal places is more for reporting than something we want inside model equations. Given that, let me look into this modeling problem a bit more as an exercise.

LP in statistics: The Dantzig Selector

Lots of statistical procedures are based on an underlying optimization problem. Least squares regression and maximum likelihood estimation are two obvious examples. In a few cases, linear programming is used. Some examples are:

Least absolute deviation (LAD) regression [1]
Chebyshev regression [2]
Quantile regression [3]

Here is another regression example that uses linear programming.

We want to estimate a sparse vector $\color{darkred}\beta$ from the linear model \[\color{darblue}y=\color{darkblue}X\color{darkred}\beta+\color{darkred}e\] where the number of observations $n$ (rows in $\color{darkblue}X$) is (much) smaller than the number of coefficients $p$ to estimate (columns in $\color{darkblue}X$) [4]: $p \gg n$. This is an alternative to the well-known Lasso method [5].

Instead of integers use binaries

In [1], a small (fragment of a) model is proposed:

High-Level Model
\[\begin{align} \min\> & \sum_i \| \color{darkblue}a_i\cdot \color{darkred}x_i\| \\ & \max_i \|\color{darkred}x_i\| = 1 \\ & \color{darkred}x_i \in \{-1,0,1\} \end{align}\]

Can we formulate this as a straight MIP?

Water

Fascinating map with annual water throughput.
This is related to water availability for irrigation. An important topic.
The Rio Grande is not so grand here.
It must not be completely trivial to produce this map.
See:
Peter Gleick and Matthew Heberger, American Rivers: A Graphic, https://pacinst.org/american-rivers-a-graphic/

Saturday, February 10, 2024

Math vs Programming

A programmer writes about this blog:

(It is old, but I just came across this).

In my previous post, I just argued the other way around. To make sure: I don't hate programmers.

BTW, in quite a few programming languages for loops are very slow, and need to be replaced by something like sum(). Examples: Python, R, SQL.

Thursday, February 8, 2024

Small non-convex MINLP: Pyomo vs GAMS

In [1], the following Pyomo model (Python fragment) is presented:

model.x = Var(name="Number of batches", domain=NonNegativeIntegers, initialize=10)                    
model.a = Var(name="Batch Size", domain=NonNegativeIntegers, bounds=(5,20))

# Objective function
def total_production(model):
    return model.x * model.a
model.total_production = Objective(rule=total_production, sense=minimize)

# Constraints
# Minimum production of the two output products
def first_material_constraint_rule(model):
    return sum(0.2 * model.a * i for i in range(1, value(model.x)+1)) >= 70
model.first_material_constraint = Constraint(rule=first_material_constraint_rule)

def second_material_constraint_rule(model):
    return sum(0.8 * model.a * i for i in range(1, value(model.x)+1)) >= 90
model.second_material_constraint = Constraint(rule=second_material_constraint_rule)

# At least one production run
def min_production_rule(model):
    return model.x >= 1
model.min_production = Constraint(rule=min_production_rule)

One nonzero in set of free variables

In [1] the following question is posed:

I have free variables $\color{darkred}x_i$. How can I impose the constraint that at least one of the variables is nonzero: $\color{darkred}x_i\ne 0$.

Informs Test of Time Award for CONOPT paper

The Test of Time Award for papers published in the INFORMS Journal on Computing in the years 1993–1997 is awarded to
CONOPT: A Large-Scale GRG Code
Arne Stolbjerg Drud
ORSA Journal on Computing 6(2):207–216, 1994

As Arne notes in [1], he is helped a bit by the fact that CONOPT users may want to cite a published paper (and because there is no newer successor paper). Still, this is quite an achievement.

GAMS listing file: missing Unicode support

Newer versions of GAMS allow UTF-8 encoded strings as labels. That is very welcome, as these labels may come from data sources that just use Unicode characters. However, when printing to the listing file, we miss proper Unicode support. At first, I thought, "OK, just a few misaligned tables. No big deal." Here is a constructed example showing this may be a bit more problematic.

String Art

In [1], a greyscale picture is approximated by strings (lines) between points around the image. Here, I will try something similar with a formal optimization model.

Thursday, October 17, 2024

Problem Statement

Wednesday, October 16, 2024

gdx2sqlite

Wednesday, October 2, 2024

Saturday, September 28, 2024

R and CSV files

Saturday, September 21, 2024

Wednesday, September 4, 2024

Sunday, September 1, 2024

Single Solution Model

Wednesday, August 28, 2024

Monday, August 12, 2024

Monday, July 15, 2024

Sunday, June 30, 2024

Wednesday, May 15, 2024

Thursday, May 9, 2024

Monday, May 6, 2024

Monday, April 15, 2024

Friday, April 12, 2024

Thursday, March 28, 2024

Saturday, February 10, 2024

Thursday, February 8, 2024

Tuesday, January 30, 2024

Tuesday, January 16, 2024

CONOPT: A Large-Scale GRG Code

Monday, January 8, 2024

Thursday, January 4, 2024