Monday, July 15, 2024

Practical Large-Scale Modeling: Sparsity

Presentation at USDA-ERS Model Summit 2024.

Sunday, June 30, 2024

Inflation is a difficult concept for many

Last friday, 6/28, new PCE (Personal Consumption Expenditures Price Index) data were released. The year-on-year inflation numbers decreased from 2.7% last month to 2.6% [1]: 



Let's see how the popular press reports this [2]:



The headline is just very wrong. Inflation was 2.6% but it did not rise by 2.6%. The PCE-based inflation number did actually decrease from 2.7% a month earlier. 

I see lots of mistakes in reporting and in posts about inflation and price indices. This is a good (or rather bad) example.


Here is a similar incorrect statement about CPI inflation:


CPI inflation was 3%. It did not rise 3%. It was actually down.



Background


There are a few different quantities here. Here is a summary:


PCE Index and Inflation
\(x_t\)PCE index at month \(t\)
\(i_t = \displaystyle\frac{x_t-x_{t-12}}{x_{t-12}}\cdot 100\%\)Inflation at month \(t\)
\(i_t - i_{t-1}\)Change in inflation at month \(t\)


This hierarchy is often poorly understood by, let's say, amateur economists.

The inflation number is measured year-on-year (i.e. with a 12-month lag). That has the advantage that there are no seasonal effects. The theory behind price indices and their calculation is fascinating, see e.g. [3]. An interesting way to create an "instantaneous inflation" number based on kernel density estimation is shown in [4].


References


  1. https://www.bea.gov/data/personal-consumption-expenditures-price-index
  2. https://www.cnbc.com/2024/06/28/may-pce-inflation-report.html
  3. Walter Erwin Diewert, John Greenlees, Charles R. Hulten, Price Index Concepts and Measurement. (2010). University of Chicago Press.
  4. Jan Eekhout, Instantaneous Inflation, 2023.

Wednesday, May 15, 2024

Another very small but very difficult global NLP model

The goal of this exercise is to fill a square area \([0,250]\times[0,100]\) with 25 circles. The model can choose the \(x\) and \(y\) coordinates of the center of each circle and the radius. So we have as variables \(\color{darkred}x_i\), \(\color{darkred}y_i\), and \(\color{darkred}r_i\). The circles placed inside the area should not overlap. The objective is to maximize the total area covered. 

A solution is:


Thursday, May 9, 2024

Modeling surprises

Here is an example where the PuLP modeling tool goes berserk.

In standard linear programming, only \(\ge\), \(=\) and \(\le\) constraints are supported. Some tools also allow \(\ne\), which for MIP models needs to be reformulated into a disjunctive constraint. Here is an attempt to do this in PuLP [1]. PuLP does not support this relational operator in its constraints, so we would expect a meaningful error message.

Monday, May 6, 2024

Rounding inside an optimization model

In [1], the question was asked: how can I round to two decimal places inside an optimization model? I.e., \[\color{darkred}y_{i,j} = \mathbf{round}(\color{darkred}x_{i,j},2)\] To get this off my chest first: I have never encountered a situation like this. Rounding to two decimal places is more for reporting than something we want inside model equations. Given that, let me look into this modeling problem a bit more as an exercise. 

Monday, April 15, 2024

LP in statistics: The Dantzig Selector

Lots of statistical procedures are based on an underlying optimization problem. Least squares regression and maximum likelihood estimation are two obvious examples. In a few cases, linear programming is used. Some examples are:

  • Least absolute deviation (LAD) regression [1]
  • Chebyshev regression [2]
  • Quantile regression [3]
Here is another regression example that uses linear programming. 

We want to estimate a sparse vector \(\color{darkred}\beta\) from the linear model \[\color{darblue}y=\color{darkblue}X\color{darkred}\beta+\color{darkred}e\] where the number of observations \(n\) (rows in \(\color{darkblue}X\)) is (much) smaller than the number of coefficients \(p\) to estimate (columns in \(\color{darkblue}X\)) [4]: \(p \gg n\). This is an alternative to the well-known Lasso method [5].

Friday, April 12, 2024

Instead of integers use binaries

In [1], a small (fragment of a) model is proposed:

High-Level Model
\[\begin{align} \min\> & \sum_i | \color{darkblue}a_i\cdot \color{darkred}x_i| \\ & \max_i |\color{darkred}x_i| = 1 \\ & \color{darkred}x_i \in \{-1,0,1\} \end{align}\]

Can we formulate this as a straight MIP? 

Thursday, March 28, 2024

Water



  • Fascinating map with annual water throughput. 
  • This is related to water availability for irrigation. An important topic.
  • The Rio Grande is not so grand here.
  • It must not be completely trivial to produce this map.
  • See: 
    Peter Gleick and Matthew Heberger, American Rivers: A Graphic, https://pacinst.org/american-rivers-a-graphic/


Saturday, February 10, 2024

Math vs Programming

 A programmer writes about this blog:



(It is old, but I just came across this).

In my previous post, I just argued the other way around. To make sure: I don't hate programmers.

BTW, in quite a few programming languages for loops are very slow, and need to be replaced by something like sum(). Examples: Python, R, SQL. 

Thursday, February 8, 2024

Small non-convex MINLP: Pyomo vs GAMS

 In [1], the following Pyomo model (Python fragment) is presented:


model.x = Var(name="Number of batches", domain=NonNegativeIntegers, initialize=10)                    
model.a = Var(name="Batch Size", domain=NonNegativeIntegers, bounds=(5,20))

# Objective function
def total_production(model):
    return model.x * model.a
model.total_production = Objective(rule=total_production, sense=minimize)

# Constraints
# Minimum production of the two output products
def first_material_constraint_rule(model):
    return sum(0.2 * model.a * i for i in range(1, value(model.x)+1)) >= 70
model.first_material_constraint = Constraint(rule=first_material_constraint_rule)

def second_material_constraint_rule(model):
    return sum(0.8 * model.a * i for i in range(1, value(model.x)+1)) >= 90
model.second_material_constraint = Constraint(rule=second_material_constraint_rule)

# At least one production run
def min_production_rule(model):
    return model.x >= 1
model.min_production = Constraint(rule=min_production_rule)