Monday, July 15, 2024

Practical Large-Scale Modeling: Sparsity

Presentation at USDA-ERS Model Summit 2024.

Sunday, June 30, 2024

Inflation is a difficult concept for many

Last friday, 6/28, new PCE (Personal Consumption Expenditures Price Index) data were released. The year-on-year inflation numbers decreased from 2.7% last month to 2.6% [1]: 



Let's see how the popular press reports this [2]:



The headline is just very wrong. Inflation was 2.6% but it did not rise by 2.6%. The PCE-based inflation number did actually decrease from 2.7% a month earlier. 

I see lots of mistakes in reporting and in posts about inflation and price indices. This is a good (or rather bad) example.


Here is a similar incorrect statement about CPI inflation:


CPI inflation was 3%. It did not rise 3%. It was actually down.



Background


There are a few different quantities here. Here is a summary:


PCE Index and Inflation
\(x_t\)PCE index at month \(t\)
\(i_t = \displaystyle\frac{x_t-x_{t-12}}{x_{t-12}}\cdot 100\%\)Inflation at month \(t\)
\(i_t - i_{t-1}\)Change in inflation at month \(t\)


This hierarchy is often poorly understood by, let's say, amateur economists.

The inflation number is measured year-on-year (i.e. with a 12-month lag). That has the advantage that there are no seasonal effects. The theory behind price indices and their calculation is fascinating, see e.g. [3]. An interesting way to create an "instantaneous inflation" number based on kernel density estimation is shown in [4].


References


  1. https://www.bea.gov/data/personal-consumption-expenditures-price-index
  2. https://www.cnbc.com/2024/06/28/may-pce-inflation-report.html
  3. Walter Erwin Diewert, John Greenlees, Charles R. Hulten, Price Index Concepts and Measurement. (2010). University of Chicago Press.
  4. Jan Eekhout, Instantaneous Inflation, 2023.

Wednesday, May 15, 2024

Another very small but very difficult global NLP model

The goal of this exercise is to fill a square area \([0,250]\times[0,100]\) with 25 circles. The model can choose the \(x\) and \(y\) coordinates of the center of each circle and the radius. So we have as variables \(\color{darkred}x_i\), \(\color{darkred}y_i\), and \(\color{darkred}r_i\). The circles placed inside the area should not overlap. The objective is to maximize the total area covered. 

A solution is:


Thursday, May 9, 2024

Modeling surprises

Here is an example where the PuLP modeling tool goes berserk.

In standard linear programming, only \(\ge\), \(=\) and \(\le\) constraints are supported. Some tools also allow \(\ne\), which for MIP models needs to be reformulated into a disjunctive constraint. Here is an attempt to do this in PuLP [1]. PuLP does not support this relational operator in its constraints, so we would expect a meaningful error message.

Monday, May 6, 2024

Rounding inside an optimization model

In [1], the question was asked: how can I round to two decimal places inside an optimization model? I.e., \[\color{darkred}y_{i,j} = \mathbf{round}(\color{darkred}x_{i,j},2)\] To get this off my chest first: I have never encountered a situation like this. Rounding to two decimal places is more for reporting than something we want inside model equations. Given that, let me look into this modeling problem a bit more as an exercise. 

Monday, April 15, 2024

LP in statistics: The Dantzig Selector

Lots of statistical procedures are based on an underlying optimization problem. Least squares regression and maximum likelihood estimation are two obvious examples. In a few cases, linear programming is used. Some examples are:

  • Least absolute deviation (LAD) regression [1]
  • Chebyshev regression [2]
  • Quantile regression [3]
Here is another regression example that uses linear programming. 

We want to estimate a sparse vector \(\color{darkred}\beta\) from the linear model \[\color{darblue}y=\color{darkblue}X\color{darkred}\beta+\color{darkred}e\] where the number of observations \(n\) (rows in \(\color{darkblue}X\)) is (much) smaller than the number of coefficients \(p\) to estimate (columns in \(\color{darkblue}X\)) [4]: \(p \gg n\). This is an alternative to the well-known Lasso method [5].

Friday, April 12, 2024

Instead of integers use binaries

In [1], a small (fragment of a) model is proposed:

High-Level Model
\[\begin{align} \min\> & \sum_i | \color{darkblue}a_i\cdot \color{darkred}x_i| \\ & \max_i |\color{darkred}x_i| = 1 \\ & \color{darkred}x_i \in \{-1,0,1\} \end{align}\]

Can we formulate this as a straight MIP? 

Thursday, March 28, 2024

Water



  • Fascinating map with annual water throughput. 
  • This is related to water availability for irrigation. An important topic.
  • The Rio Grande is not so grand here.
  • It must not be completely trivial to produce this map.
  • See: 
    Peter Gleick and Matthew Heberger, American Rivers: A Graphic, https://pacinst.org/american-rivers-a-graphic/


Saturday, February 10, 2024

Math vs Programming

 A programmer writes about this blog:



(It is old, but I just came across this).

In my previous post, I just argued the other way around. To make sure: I don't hate programmers.

BTW, in quite a few programming languages for loops are very slow, and need to be replaced by something like sum(). Examples: Python, R, SQL. 

Thursday, February 8, 2024

Small non-convex MINLP: Pyomo vs GAMS

 In [1], the following Pyomo model (Python fragment) is presented:


model.x = Var(name="Number of batches", domain=NonNegativeIntegers, initialize=10)                    
model.a = Var(name="Batch Size", domain=NonNegativeIntegers, bounds=(5,20))

# Objective function
def total_production(model):
    return model.x * model.a
model.total_production = Objective(rule=total_production, sense=minimize)

# Constraints
# Minimum production of the two output products
def first_material_constraint_rule(model):
    return sum(0.2 * model.a * i for i in range(1, value(model.x)+1)) >= 70
model.first_material_constraint = Constraint(rule=first_material_constraint_rule)

def second_material_constraint_rule(model):
    return sum(0.8 * model.a * i for i in range(1, value(model.x)+1)) >= 90
model.second_material_constraint = Constraint(rule=second_material_constraint_rule)

# At least one production run
def min_production_rule(model):
    return model.x >= 1
model.min_production = Constraint(rule=min_production_rule)

Tuesday, January 30, 2024

One nonzero in set of free variables

In [1] the following question is posed:

I have free variables \(\color{darkred}x_i\). How can I impose the constraint that at least one of the variables is nonzero: \(\color{darkred}x_i\ne 0\).

Tuesday, January 16, 2024

Informs Test of Time Award for CONOPT paper

The Test of Time Award for papers published in the INFORMS Journal on Computing in the years 1993–1997 is awarded to

CONOPT: A Large-Scale GRG Code

Arne Stolbjerg Drud

ORSA Journal on Computing 6(2):207–216, 1994 


As Arne notes in [1], he is helped a bit by the fact that CONOPT users may want to cite a published paper (and because there is no newer successor paper). Still, this is quite an achievement. 

Monday, January 8, 2024

GAMS listing file: missing Unicode support

Newer versions of GAMS allow UTF-8 encoded strings as labels. That is very welcome, as these labels may come from data sources that just use Unicode characters. However, when printing to the listing file, we miss proper Unicode support. At first, I thought, "OK, just a few misaligned tables. No big deal." Here is a constructed example showing this may be a bit more problematic.

Thursday, January 4, 2024

String Art




 

In [1], a greyscale picture is approximated by strings (lines) between points around the image. Here, I will try something similar with a formal optimization model.