Yet Another Math Programming Consultant

Thursday, April 28, 2022

Inverting a large, dense matrix

In this post I show some experiences with inverting a large, dense matrix. The variability in performance was a bit surprising to me.

Background

I have available in a GAMS GDX file a large matrix $A$. This is an input-output matrix. The goal is to find the so-called Leontief inverse \[(I-A)^{-1}\] and return this in a GAMS GDX for subsequent use [1]. It is well-known that instead of calculating the inverse it is better to form an LU factorization of $I-A$ and then use that for different rhs vectors $b$ (essentially a pair of easy triangular solves). [2] However, economists are very much used to having access to an explicit inverse.

Expanding Visual Studio in the Task Manager

This is scary. Note that is in an idle state (nothing compiling, running, or debugging at the moment). Soon we all need 128 GB of RAM.

Thursday, April 14, 2022

GAMS: Undocumented PUT formats

The GAMS documentation mentions f.nr=1 (standard notation) and f.nr=2 (scientific notation) for formatting numeric items using the PUT statement. There are however a few more. Here we print values using different formats:

       f.nr=1       f.nr=2       f.nr=3       f.nr=4
         1.20     1.20E+00 1.2000000000          1.2
         1.23     1.23E+00 1.2345000000       1.2345
 123450000.00     1.23E+08 123450000.00    123450000
         0.00     1.20E-07 0.0000001200       1.2E-7
         0.00     1.23E-07 0.0000001235    1.2345E-7

It looks like formats 3 and 4 are inspired by SAS PUT formats f12.10 and best12.10.

References

Dictionary of formats, https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/p0z62k899n6a7wn1r5in6q5253v1.htm

Appendix: GAMS code

set
   i   'values'   /i1*i5/
   fmt 'formats' /'f.nr=1'*'f.nr=4'/
;

parameter p(i) 'test values' /
   i1 1.2
   i2 1.2345
   i3 1.2345e8
   i4 1.2e-7
   i5 1.2345e-7
/;

*
* write all values in all formats
*

file f /put.txt/;
* right align text
f.lj = 1;
put f;
* write headers
loop(fmt,
   put fmt.tl:13
);
put /;
* write values
loop(i,
   loop(fmt,
      put ' ':0;
      f.nr=ord(fmt);
      put p(i);
   );
   put /;
);

Sunday, April 10, 2022

Rewriting old GAMS code

It is always a good idea to revisit existing GAMS code and see if we can improve it. Here is an example of an actual model.

The problem is that we want to set up a mapping set between two sets based on the first two characters. If they are the same, the pair should be added to the mapping. The old code looked like:

sets
   r 'regions' /
       AP   'Appalachia'
       CB   'Corn Belt'
       LA   'Lake States'
       /
   s 'subregions' /
       APN0501
       APN0502
       APN0504
       CBL0704
       CBM0404
       LAM0404
       LAM0406
       /
   rs(r,s) 'mapping regions<->subregions'
;

* old code:
PARAMETER value;
LOOP(R, value('1',R) = 100*ord( R.tl,1) + ord( R.tl,2); );
LOOP(S, value('2',S) = 100*ord( S.tl,1) + ord( S.tl,2); );
RS(R,S)$(value('1',R) = value('2',S)) = YES;

display value;
display rs;

To verify this indeed generates the correct result for set rs, we look at the output:

----     27 PARAMETER value  

           AP          CB          LA     APN0501     APN0502     APN0504     CBL0704     CBM0404     LAM0404

1    6580.000    6766.000    7665.000
2                                        6580.000    6580.000    6580.000    6766.000    6766.000    7665.000

+     LAM0406

2    7665.000


----     28 SET rs  mapping regions<->subregions

       APN0501     APN0502     APN0504     CBL0704     CBM0404     LAM0404     LAM0406

AP         YES         YES         YES
CB                                             YES         YES
LA                                                                     YES         YES

The set rs is obviously correctly populated. The auxiliary parameter value contains the ASCII values: e.g. AP has value 6580 as the decimal ASCII codes for A and P are 65 and 80.

Notes:

The function ord(string, pos) returns the ASCII value of the character at position pos of the string.
The suffix .TL is the text string of the set element (internally GAMS does not work with the strings but rather an integer id for each set element; here we make it explicit we want the string).
The loops are not really needed. We could have used:
value('1',R) = 100*ord( R.tl,1) + ord( R.tl,2);
value('2',S) = 100*ord( S.tl,1) + ord( S.tl,2);
The mapping set is a very powerful concept in GAMS. It is somewhat like a dict in Python, except it is not one way. It can map from r to s but also from s to r.

We can simplify the code to just:

* new code:
* compare first two characters in r and s
rs(r,s) = ord(r.tl,1)=ord(s.tl,1) and ord(r.tl,2)=ord(s.tl,2);
display rs;

This gives the same result:

----     22 SET rs  mapping regions<->subregions

       APN0501     APN0502     APN0504     CBL0704     CBM0404     LAM0404     LAM0406

AP         YES         YES         YES
CB                                             YES         YES
LA                                                                     YES         YES

Notes:

No need for the auxiliary parameter value.
No loops.
We could add a check that we have no unmapped subregions. E.g. by:
abort$(card(rs)<>card(s)) "There are unmapped subregions";
This will also catch cases where the casing does not match.
Critical note: GAMS has very poor support for strings. This should have been rectified a long time ago. There is no excuse for this deficiency.

Conclusion

It is always worthwhile to revisit existing models and clean them up. During development, especially when under time pressure, we often are happy once our code works. As the lifetime of models can be surprisingly long, and thus new generations of users and modelers start working with old models, it is a good idea to do a cleanup round to make the code as readable as possible.

Saturday, March 5, 2022

Bostock's command-line cartography tutorial under Windows

In [1], an excellent tutorial is presented about the process of making maps. It is a little bit dated, so here I develop some Windows-based scripts that make it possible to follow these tutorials. The goal is two-fold:

Make things work for Windows. I am comfortable with the Unix command line, but, by far, most of my colleagues are not. To allow for easier sharing, I used Windows CMD.
Update the commands so they all function again. Some URLs have changed a little bit, and we need to use a specific version of d3. If you want to use the original Unix commands, and you encounter some issues, you may want to check how I repaired them.

The output format of these scripts is svg. SVG files are plain text, represent vectors (instead of bitmaps), and can be read directly by a standard web browser.

Part 1

The first thing to do is to install node.js if you don't have this available already. Using [2] this process is quite straightforward. Our first batch file will follow the steps in [1].

::---------------------------------------------------------------------------
::
::   Mike Bostock
::   Command-Line Cartography, Part 1
::   A tour of d3-geo’s new command-line interface.
:: 
::   https://medium.com/@mbostock/command-line-cartography-part-1-897aa8f8ca2c
::
::   Needs node.js. Install from: https://nodejs.org/en/download/
::
::   we translate UNIX command line to Windows CMD
::---------------------------------------------------------------------------

:: download shape file from Census Bureau (region 06 = California)
::
:: curl 'http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_06_tract_500k.zip' -o cb_2014_06_tract_500k.zip
:: use https instead, curl is part of windows
curl -O https://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_06_tract_500k.zip 

:: unzip -o cb_2014_06_tract_500k.zip
:: use tar instead (part of windows)
tar -xf cb_2014_06_tract_500k.zip

:: install node.js package
:: and extract geojson
call npm install -g shapefile
call shp2json cb_2014_06_tract_500k.shp -o ca.json

:: perform projection 
call npm install -g d3-geo-projection
call geoproject "d3.geoConicEqualArea().parallels([34, 40.5]).rotate([120, 0]).fitSize([960, 960], d)" < ca.json > ca-albers.json

:: create svg
call geo2svg -w 960 -h 960 < ca-albers.json > ca-albers.svg

:: launch browser
start ca-albers.svg

Notes:

The curl command is available on newer versions of Windows.
Windows does not have unzip, but it has tar which can unzip files.
Use https instead of HTTP.
The use of quotes is different under Windows compared to Unix.

The result should be the following vector image displayed in your browser:

Wanted: better error messages

In [1] a user tries the following constraint:

mdl.addConstrs((t[i,k] * X[i,j,k] - te1[i] <= 5) >> (z1[i,k] == 1) for i,j,k in arcos if i != 0 and i != 23)

(t and X and variables). This is not accepted, and the following error message is issued:

AttributeError: 'gurobipy.QuadExpr' object has no attribute 'getVar'

This is a terrible error message. It does not describe the real, underlying problem, and it does not tell the user what to do next. Developers typically do not pay much attention to error messages. Often they seem more aimed to make life easy for the developer than providing assistance to an already confused user. In this case, I believe the modeling tool did not even issue an error message, but rather made assumptions without checking, and left it to the Python run-time system to complain.

This constraint has two problems:

Indicator constraints have a simple binary variable as condition, not a general (boolean) expression
Only linear constraints are supported in indicator constraints

Maybe a better message is:

Indicator constraints have the form:

binary variable == 0 ==> linear constraint

or

binary variable == 1 ==> linear constraint

Your constraint does not have this form. You need to reformulate it to fit this format, or use a different (big-M) formulation.

Well-formed and meaningful error messages are very important. The user is already confused, otherwise (s)he would formulate a correct constraint. Adding confusing error messages to this confusion is not a good idea. A good error message could have prevented this post. We can think of error messages as the UI (User Interface) when it really matters: when the user needs help. Developers should pay much more attention to this.

A question for another day is of course: why are only linear constraints supported?

References

AttributeError: 'gurobipy.QuadExpr' object has no attribute 'getVar', https://stackoverflow.com/questions/71153625/attributeerror-gurobipy-quadexpr-object-has-no-attribute-getvar/71158372

Wednesday, February 16, 2022

SLSQP original paper

This is somewhat hard to find (although I did), so I share it. This is the original paper[1] describing the SLSQP solver used in scipy[2]:

Visualization: Animating flow along an edge

In a large network, it is a bit of a pain to visualize the flow. Here is an attempt to show the shortest path in a random sparse directed graph. The shortest path is of course a min-cost flow of one unit. I used cytoscape.js [1] to generate the picture.

4-color maps: another model

In [1], I discussed coloring all US counties (3,221 of them) such that neighboring counties have a different color. The famous 4-color theorem says we can do this with just 4 colors. After some adventures with the data, I was indeed able to produce maps with 4 colors.

Model 1: US Counties colored with minimum number of colors

An interesting part is the color count table. That table shows the number of counties colored by each color. This is a bit unbalanced. So an obvious question is:

Can we find a 4-coloring that has a more equal use of colors?

Is a heuristic a solver?

Can we call a heuristic (especially a meta-heuristic) a solver? I am not so sure.

Warning: this is somewhat of an opinionated rant. Many of you may not agree.

The name "solver" seems to imply that the underlying algorithm knows when an optimization problem is solved. By that I mean: it can stop at some stage and report "solved". Meta-heuristics typically don't have a clue about optimality, and just keep on churning until hitting some form of time or iteration limit. (One could devise some stopping criteria basic on stochastic reasoning, but that is not often used in meta-heuristics).

So my proposal is:

Heuristics (including meta-heuristics) should not be called solvers. They don't know when a problem is solved.

Yet Another Math Programming Consultant

Thursday, April 28, 2022

Inverting a large, dense matrix

Background

Saturday, April 16, 2022

Expanding Visual Studio in the Task Manager

Thursday, April 14, 2022

GAMS: Undocumented PUT formats

References

Appendix: GAMS code

Sunday, April 10, 2022

Rewriting old GAMS code

Conclusion

Saturday, March 5, 2022

Bostock's command-line cartography tutorial under Windows

Part 1

Friday, February 18, 2022

Wanted: better error messages

References

Wednesday, February 16, 2022

SLSQP original paper

Tuesday, February 15, 2022

Visualization: Animating flow along an edge

Thursday, February 10, 2022

4-color maps: another model

Monday, January 31, 2022

Is a heuristic a solver?