Friday, December 2, 2022

Networks not always need an extra source and sink node

Convert standard assignment problem to a network problem


A standard assignment problem formulated as an LP/MIP looks like:

 
Assignment problem
\[\begin{align}\min&\sum_{i,j} \color{darkblue}c_{i,j}\cdot\color{darkred}x_{i,j}\\ & \sum_j \color{darkred}x_{i,j}=1 && \forall i \\ & \sum_i \color{darkred}x_{i,j}=1 && \forall j \\ & \color{darkred}x_{i,j} \in \{0,1\} \end{align}\]

Monday, November 28, 2022

Cuckoo search is a bit cuckoo

Excuses for the lame title.





This could be the start of a very extended series of papers.



References


  1. Christian L. Camacho-Villalón, Marco Dorigo, Thomas Stützle, An analysis of why cuckoo search does not bring any novel ideas to optimization, Computers & Operations Research, Volume 142, 2022, https://www.sciencedirect.com/science/article/pii/S0305054822000442.
  2. Aranha, C., Camacho Villalón, C.L., Campelo, F. et al. Metaphor-based metaheuristics, a call for action: the elephant in the room. Swarm Intell 16, 1–6 (2022)

Tuesday, November 22, 2022

A strange series

In [1] a question is posed about a somewhat strange series. Let \(i \in \{ 0,\dots,n\}\). Then we require:

  • \(\color{darkred}x_i \in \{0,1,2,\dots\}\),
  • \(\color{darkred}x_i\) is equal to the number of times the value \(i\) occurs in the series. In mathematical terms, one could say: \[\color{darkred}x_i = |\{j:\color{darkred}x_j=i\}|\] (here \(|S|\) is the cardinality of the set \(S\)).

An example for \(n=4\) is: 

index  0 1 2 3 4 
value  2 1 2 0 0

Tuesday, November 8, 2022

sum of K largest

Consider a vector of variables \(\color{darkred}x_i\). We want to impose the constraint: \[\text{sum of $K$ largest $\color{darkred}x_i$}  \le 0.5\sum_i \color{darkred}x_i\] I.e. the sum of the largest \(K\) values are less than half of the total. Sometimes this is expressed mathematically as \[\sum_{i=1}^K \color{darkred}x_{[i]} \le 0.5\sum_i \color{darkred}x_i\] where \(\color{darkred}x_{[i]}\) is an ordered version of \(\color{darkred}x_i\) with \(\color{darkred}x_{[1]}\ge\color{darkred}x_{[2]} \ge \color{darkred}x_{[3]} \dots \). There are several formulations for this constraint, all of them interesting, I think.

Monday, October 24, 2022

Brian Kernighan talks about modeling languages

 



Nov 17, 2015 Professor Brian Kernighan presents on 'How to succeed in language design without really trying.' Brian Kernighan is Professor of Computer Science at Princeton University and Honorary Professor in the School of Computer Science at The University of Nottingham.

Basically, Kernighan says:
  • GAMS is terrible (somewhat of a side note). This makes some sense: that opinion was probably the reason to develop AMPL in the first place (if he thought GAMS was terrific, why develop AMPL).
  • Original declarative AMPL i.e. the part he worked on: good.
  • Later procedural facilities in AMPL (he was not involved in that): bad. (IMHO practical modeling often needs some procedural support; this is one of the reasons why modeling tools embedded in Python are popular).

Thursday, October 20, 2022

Micro-econometrics: discrete choice models

In this post, I want to discuss some statistical models from [1]. I'll implement these models in GAMS. First of all to emphasize these are all (nonlinear) optimization problems. Instead of using canned routines using a statistical package, this can help to get a better understanding of what is really going on. At least for me, not using a black-box routine, forces me to understand the underlying optimization models. Another application can be to have this part of a larger GAMS model. Some mathematical programming models just need some estimation code before the real model can be attacked. If the rest of the model is in GAMS, it may be a little bit easier to also use GAMS in the estimation tasks, instead of using a statistical package. This actually happens quite a lot. Finally, it may be easier to add constraints using a GAMS formulation compared to a canned routine in a statistical package.

In this case, the first argument was the reason for using GAMS. Reproducing results is for me a good tool to help me understand a dense text.  

Monday, October 3, 2022

Select points

In [1], the following problem is posted:


I have multiple sets of data points. For example, set 1 contains 5 data points, set 2 contains 1 data point, set 3 contains 10, etc. I need to select one data point from each set so that distances between these selected points is minimal. Any Python based functions to be used will be very helpful


This can be stated as an optimization problem. Writing down the model is a useful exercise, not only to solve it and get solutions but also to define the problem a bit more precisely than a typical "word problem". 

A super simple non-convex MIQP model is:


Non-convex MIQP Model
\[\begin{align}\min&\sum_{i,j|\color{darkblue}{\mathit{ok}}_{i,j}} \color{darkblue}{\mathit{dist}}_{i,j}\cdot\color{darkred}x_i \cdot\color{darkred}x_j \\ & \sum_{i|\color{darkblue}{\mathit{group}}_{i,g}} \color{darkred}x_i = 1 && \forall g\\ & \color{darkred}x_i \in \{0,1\} \end{align}\]

Thursday, August 25, 2022

Some Matrix Balancing Experiments

This is about the matrix balancing problem.

We have three sets of data:
  • A matrix with with entries \(\color{darkblue}A^0_{i,j}\ge 0\). 
  • Row- and column-totals \(\color{darkblue}r_i\) and  \(\color{darkblue}c_j\).
The \(\color{darkblue}A^0\) matrix is collected from different sources than the row- and column-totals. So the matrix elements don't sum up to our totals. The problem is finding a nearby matrix \(\color{darkred}A\), so the row and column totals are obeyed. In addition, we want to preserve the sparsity pattern of  \(\color{darkblue}A^0\): zeros should stay zero. And also: we don't want to introduce negative numbers (preserve the signs). More formally:


Matrix Balancing Problem
\[\begin{align}\min\>&{\bf{dist}}(\color{darkred}A,\color{darkblue}A^0)\\ & \sum_i \color{darkred}A_{i,j} = \color{darkblue}c_j && \forall j\\ & \sum_j \color{darkred}A_{i,j} = \color{darkblue}r_i && \forall i \\&\color{darkred}A_{i,j}=0 &&\forall i,j|\color{darkblue}A^0_{i,j}=0\\ &\color{darkred}A_{i,j}\ge 0 \end{align} \]


Approximate the matrix subject to row- and column-sum constraints


Tuesday, August 23, 2022

Wordle: number of words from unique letters

In [1] an optimization model is proposed for the following problem:

Using a list of 5-letter words, try to find a set of 5 words such that each letter is used at most once. 


Data


The list of 5-letter words is from the wordle [2] game. Words with duplicate letters are removed. A partial list is:


----   1035 SET w  words: large list of 5-letter words

abdom,    abend,    abets,    abhor,    abide,    abies,    abyes,    abilo,    abime,    abysm,    abled,    abler,    ables
ablet,    ablow,    abmho,    abner,    abnet,    abode,    abody,    abohm,    aboil,    abord,    abort,    abote,    about
above,    abret,    abrim,    abrin,    abris,    abrus,    absey,    absit,    abstr,    abune,    abuse,    abush,    abuts
acedy,    acerb,    ached,    achen,    acher,    aches,    achor,    acidy,    acids,    acier,    acies,    acyls,    acing
ackey,    acker,    aclys,    acmes,    acned,    acnes,    acoin,    acold,    acone,    acorn,    acost,    acoup,    acred
acres,    acrid,    acryl,    acron,    acrux,    acted,    actin,    acton,    actor,    actos,    actus,    acute,    adcon


The total number of 5-letter words in this list is: 10,175.

Tuesday, August 16, 2022

Primal, Dual and Equilibrium format of the Transportation Model

The primal transportation model [1] can be stated as:


Primal formulation
\[\begin{align}\min& \sum_{i,j} \color{darkblue}{\mathit{cost}}_{i,j}\cdot\color{darkred}x_{i,j} \\ & \sum_j \color{darkred}x_{i,j}\le \color{darkblue}{\mathit{supply}}_{i}\perp \color{darkred}u_i \le 0 && \forall i \\ & \sum_i \color{darkred}x_{i,j}\ge \color{darkblue}{\mathit{demand}}_{j}\perp \color{darkred}v_j \ge 0 && \forall j \\ & \color{darkred}x_{i,j}\ge 0\end{align} \]

Here \(\perp\) indicates "with dual ...". The duals for this model are \(\color{darkred}u_i\) and \(\color{darkred}v_j\). In the model, we have added the (optimal) signs of the duals. It may come as a surprise that the optimality of a solution can be established by just looking at the signs of the marginals (duals and reduced cost). When we print the results of this model, we can see something like:

Saturday, August 13, 2022

k out of n with smallest bandwidth

The problem from [1], slightly generalized, is easy to describe:

Given an \(n\) vector \(\color{darkblue}v_i\) with data, select \(k\lt n\) elements that are closest to each other.


Of course, we can state the problem as a formal optimization model:  


Optimization Model
\[\begin{align}\min\>&\color{darkred}U-\color{darkred}L \\ & \color{darkred}\delta_i=1 \Rightarrow \color{darkblue}v_i \in [\color{darkred}L,\color{darkred}U] \\ & \sum_i \color{darkred}\delta_i=\color{darkblue}k \\ & \color{darkred}\delta_i \in \{0,1\} \end{align}\]

Wednesday, August 10, 2022

Large sparse transportation model with CVXPY,CVXR

In [1] I was trying out different formulations of a large, sparse (but very easy) transportation model using different modeling tools and solvers. The conclusion was:

  • Exploiting sparsity leads to the best performance
  • GAMS/Cplex is about 10 times as fast as Python/PuLP/CBC
  • GAMS/Cplex is about 1000 times as fast as R/OMPR/GLPK

Don't overinterpret these numbers: this is a single model, which may not be at all representable for your models. Let's see how CVXPY[2] and CVXR[3] are doing. To recap we want to solve a large (but easy) transportation model:

Dense Transportation Model
\[\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\cdot\color{darkred}x_{i,j}\\ & \sum_j \color{darkred}x_{i,j} \le \color{darkblue}a_i && \forall i \\& \sum_i \color{darkred}x_{i,j} \ge \color{darkblue}b_j && \forall j \\ & \color{darkred}x_{i,j} \ge 0 \end{align}\]

The additional wrinkle is that only a small fraction of the links \(i \rightarrow j\) exist.

There are (at least) three ways to model this:

  1. Use a large cost coefficient for forbidden links.
  2. Fix \(\color{darkred}x_{i,j}=0\) for non-existing links.
  3. Exploit sparsity directly by only summing over existing links.
CVXPY/CVXR are matrix-based modeling tools. That gives some extra things to think about. A transportation model in matrix-notation can look like:


Transportation Model in Matrix Notation
\[\begin{align}\min\>&{\bf tr}(\color{darkblue}C^T\color{darkred}X)\\&\color{darkred}X\color{darkblue}e \le \color{darkblue}a \\ & \color{darkred}X^t\color{darkblue}e \ge \color{darkblue}b \\ & \color{darkred}X \ge 0 \end{align}\]

Monday, August 1, 2022

Curl.exe

Newer versions of Windows come with a very useful tool to download files: curl.exe.

The standard help is:

C:\Users\erwin>where curl

C:\Windows\System32\curl.exe

             

C:\Users\erwin>curl --help

Usage: curl [options...] <url>

 -d, --data <data>          HTTP POST data

 -f, --fail                 Fail fast with no output on HTTP errors

 -h, --help <category>      Get help for commands

 -i, --include              Include protocol response headers in the output

 -o, --output <file>        Write to file instead of stdout

 -O, --remote-name          Write output to a file named as the remote file

 -s, --silent               Silent mode

 -T, --upload-file <file>   Transfer local FILE to destination

 -u, --user <user:password> Server user and password

 -A, --user-agent <name>    Send User-Agent <name> to server

 -v, --verbose              Make the operation more talkative

 -V, --version              Show version number and quit

 

This is not the full help, this menu is stripped into categories.

Use "--help category" to get an overview of all categories.

For all options use the manual or "--help all".

 

C:\Users\erwin>


A model with semi-continuous variables

In [1], a selection problem is posted, that turns out to be an easy MIP. The problem is:

From a (large) collection of products select those products with the highest unit return (or value) subject to:
  • A product can only be ordered between a minimum and maximum amount
  • There is a budget w.r.t. the total amount of products
This can be modeled using so-called semi-continuous variables: variables that can assume zero or a value between the constants \(\color{darkblue}L\) and \(\color{darkblue}U\). So we can write:


MIP model with semi-continuous variables
\[\begin{align} \max & \sum_i \color{darkred}x_i \cdot \color{darkblue}{\mathit{rate}}_i \\ & \sum_i \color{darkred}x_i \le \color{darkblue}{\mathit{budget}} \\ & \color{darkred}x_i \in \{0\}\cup [\color{darkblue}L_i,\color{darkblue}U_i] \end{align} \]

Tuesday, July 19, 2022

The inverse of A(i,j) has signature B(j,i)

GAMS has strict domain checking (type checking). This has some interesting consequences. Basically all good: much better protection against errors compared to simply using integer indices \(1,\dots,n\). Consider the square matrix \(\color{darkblue}A_{i,j}\):


set
  i
/a,b,c,d/
  j
/1,2,3,4/
;
parameter A(i,j) 'square matrix';
A(i,j) = min(
ord(i),ord(j));
display A;

Tuesday, July 12, 2022

Linearize min(a,b)>min(x,y)

This question was posed [1]: How to linearize \[\min(\color{darkred}a,\color{darkred}b)\gt \min(\color{darkred}x,\color{darkred}y)\] where \(\color{darkred}a,\color{darkred}b,\color{darkred}x,\color{darkred}y\) are variables. The type of variables is not specified so let's assume they all are continuous variables.

Monday, July 11, 2022

Transportation model with some non-existing links

I am again looking at simple models based on the transportation model:

Dense Transportation Model
\[\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\cdot\color{darkred}x_{i,j}\\ & \sum_j \color{darkred}x_{i,j} \le \color{darkblue}a_i && \forall i \\& \sum_i \color{darkred}x_{i,j} \ge \color{darkblue}b_j && \forall j \\ & \color{darkred}x_{i,j} \ge 0 \end{align}\]


Friday, July 8, 2022

Artificially creating a large LP model: why is Cplex so efficient

I am teaching some GAMS classes this summer. The first class is about the transportation model. This is model number 1 from the GAMS model library [1]. It is a tiny model, slightly modified from the original model in [2]. I believe the GAMS version has been made dual degenerate.



Wednesday, June 29, 2022

XNOR as linear inequalities

Boolean expressions are found in many MIP models. AND and OR are the most common. When we have an expression like \(x {\bf{\ and\ }} y\) or \(x {\bf{\ or\ }} y\), where all variables are binary, we typically reformulate this as a set of inequalities. XOR is a bit more exotic, but I have never seen a usage for XNOR. Until now [1].

As this is about boolean logic, in the discussion below all variables \(x\), \(y\), \(z\) are binary.   

AND  

Monday, June 27, 2022

Goal Programming

In [1] a goal programming [2] model was stated, although the poster did not use that term.

The model is fairly simple:

  • We have 6 different raw materials that need to be blended into a final product.
  • The raw materials have 4 different ingredients.
  • We want to blend one unit of final product that has some target values for some formulas. We have three of these goals. 

The data looks like:

Thursday, June 16, 2022

MAX-CUT

The MAX-CUT problem is quite famous [1]. It can be stated as follows:

Given an undirected graph \(G=(V,E)\), split the nodes into two sets. Maximize the number of edges that have one node in each set.

Here is an example using a random sparse graph:

Sunday, June 5, 2022

Maximizing Standard Deviation

In [1] a question was posed: how to maximize the standard deviation of a vector \(\color{darkred}x_i\), subject to side constraints using an LP/MIP formulation?

The standard deviation is usually defined as: \[\mathbf{sd}(x) = \sqrt{\frac{1}{n-1}\sum_i \left(x_i-\mu\right)^2}\] where \[\mu = \frac{\sum_i x_i}{n}\] and \(n\) is the number of elements in \(x\).

Thursday, June 2, 2022

A subset selection problem

Selecting \(k\) out of \(n\) can be a computational difficult task. Here we look at a special case and see how we can exploit the properties of the data. Suppose we have \(n\) data points. We want to select \(k\) points such that the average of these \(k\) points is as close to a target value \(\color{darkblue}t\) as possible [1]. I.e. we want \[\min_{\color{darkred}S} \left| \frac{\sum_{s \in \color{darkred}S}\color{darkblue}p_s}{\color{darkblue}k} - \color{darkblue}t \right| \] where \(|\color{darkred}S|=k\). The special property of the data is that it is composed of just a few different values. Let's say our data looks like:

Tuesday, May 24, 2022

Coding is to programming like typing is to writing

Quote by Leslie Lamport: 

Coding is to programming like typing is to writing



My generalization: IMHO, developers often start coding (i.e. typing) way too early.


Monday, May 16, 2022

Ranking using numpy.argsort

I needed to find a ranking of a large data set. Using Python, it makes sense to look at the numpy library for this. 

Numpy has the function argsort, which returns index positions [1]. One would think these are exactly the ranks we are after. Unfortunately, this is not the case.

>>> import numpy as np
>>> a = [3.0, 1.0, 5.0, 2.0]
>>> indx = np.argsort(a)
>>> indx
array([1, 3, 0, 2], dtype=int64)


Wednesday, May 4, 2022

Evil Sudoku

On sudoku.com[1] there is now an "evil" difficulty level. I found these very difficult to solve by hand (I usually give up).



Sunday, May 1, 2022

Getting rid of non-linearities

In [1], an attempt was made to implement the following non-linear constraint: \[\frac{\displaystyle\sum_t \max\left\{\sum_i CF_{i,t}\cdot q_i -L_t, 0 \right\}}{\displaystyle\sum_t L_t} \le 0.015 \] Obviously code like:

 
cfm = quicksum( max(quicksum(cf[i][t] * q[i] - L[t] for i in range(I)), 0) for t in range(T) /  quicksum(L[t] for t in range(T)) <= 0.015

is really not what we want to see. It is best, not to directly start up your editor and write Python code like this. Rather, get a piece of paper, and focus on the math.

There are two non-linearities: a division and a \(\max()\) function. Note that I am not sure which symbols are variables or parameters.  I'll try to explain some possible cases below.

Thursday, April 28, 2022

Inverting a large, dense matrix

In this post I show some experiences with inverting a large, dense matrix. The variability in performance was a bit surprising to me. 

Background


I have available in a GAMS GDX file a large matrix \(A\). This is an input-output matrix. The goal is to find the so-called Leontief inverse \[(I-A)^{-1}\] and return this in a GAMS GDX for subsequent use [1]. It is well-known that instead of calculating the inverse it is better to form an LU factorization of \(I-A\) and then use that for different rhs vectors \(b\) (essentially a pair of easy triangular solves). [2] However, economists are very much used to having access to an explicit inverse.

Saturday, April 16, 2022

Expanding Visual Studio in the Task Manager

 


This is scary. Note that is in an idle state (nothing compiling, running, or debugging at the moment). Soon we all need 128 GB of RAM.

Thursday, April 14, 2022

GAMS: Undocumented PUT formats

The GAMS documentation mentions f.nr=1 (standard notation) and f.nr=2 (scientific notation) for formatting numeric items using the PUT statement. There are however a few more. Here we print values using different formats:


       f.nr=1       f.nr=2       f.nr=3       f.nr=4
         1.20     1.20E+00 1.2000000000          1.2
         1.23     1.23E+00 1.2345000000       1.2345
 123450000.00     1.23E+08 123450000.00    123450000
         0.00     1.20E-07 0.0000001200       1.2E-7
         0.00     1.23E-07 0.0000001235    1.2345E-7


It looks like formats 3 and 4 are inspired by SAS PUT formats f12.10 and best12.10.

References

  1. Dictionary of formats, https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/p0z62k899n6a7wn1r5in6q5253v1.htm


Appendix: GAMS code


set
   i  
'values'   /i1*i5/
   fmt
'formats'  /'f.nr=1'*'f.nr=4'/
;

parameter p(i) 'test values' /
  
i1  1.2
  
i2  1.2345
  
i3  1.2345e8
  
i4  1.2e-7
  
i5  1.2345e-7
/;

*
* write all values in all formats
*

file f /put.txt/;
* right align text
f.lj = 1;
put f;
* write headers
loop(fmt,
  
put fmt.tl:13
);
put /;
* write values
loop(i,
  
loop(fmt,
     
put ' ':0;
      f.nr=
ord(fmt);
     
put p(i);
   );
  
put /;
);

Sunday, April 10, 2022

Rewriting old GAMS code

It is always a good idea to revisit existing GAMS code and see if we can improve it. Here is an example of an actual model.

The problem is that we want to set up a mapping set between two sets based on the first two characters. If they are the same, the pair should be added to the mapping. The old code looked like:


sets
   r
'regions' /
      
AP   'Appalachia'
      
CB   'Corn Belt'
      
LA   'Lake States'
      
/
   s
'subregions' /
      
APN0501
      
APN0502
      
APN0504
      
CBL0704
      
CBM0404
      
LAM0404
      
LAM0406
      
/
   rs(r,s)
'mapping regions<->subregions'
;


* old code:
PARAMETER value;
LOOP(R,  value('1',R) = 100*ord( R.tl,1) + ord( R.tl,2); );
LOOP(S,  value('2',S) = 100*ord( S.tl,1) + ord( S.tl,2); );
RS(R,S)$(value(
'1',R) = value('2',S)) = YES;


display value;
display rs;

  

To verify this indeed generates the correct result for set rs, we look at the output:


----     27 PARAMETER value  

           AP          CB          LA     APN0501     APN0502     APN0504     CBL0704     CBM0404     LAM0404

1    6580.000    6766.000    7665.000
2                                        6580.000    6580.000    6580.000    6766.000    6766.000    7665.000

+     LAM0406

2    7665.000


----     28 SET rs  mapping regions<->subregions

       APN0501     APN0502     APN0504     CBL0704     CBM0404     LAM0404     LAM0406

AP         YES         YES         YES
CB                                             YES         YES
LA                                                                     YES         YES


The set rs is obviously correctly populated. The auxiliary parameter value contains the ASCII values: e.g. AP has value 6580 as the decimal ASCII codes for A and P are 65 and 80.

Notes:
  • The function ord(string, pos) returns the ASCII value of the character at position pos of the string.
  • The suffix .TL is the text string of the set element (internally GAMS does not work with the strings but rather an integer id for each set element; here we make it explicit we want the string).
  • The loops are not really needed. We could have used:
    value('1',R) = 100*ord( R.tl,1) + ord( R.tl,2);
    value('2',S) = 100*ord( S.tl,1) + ord( S.tl,2);
  • The mapping set is a very powerful concept in GAMS. It is somewhat like a dict in Python, except it is not one way. It can map from r to s but also from s to r.

We can simplify the code to just:

* new code:
* compare first two characters in r and s
rs(r,s) =
ord(r.tl,1)=ord(s.tl,1) and ord(r.tl,2)=ord(s.tl,2);
display rs;


This gives the same result:

----     22 SET rs  mapping regions<->subregions

       APN0501     APN0502     APN0504     CBL0704     CBM0404     LAM0404     LAM0406

AP         YES         YES         YES
CB                                             YES         YES
LA                                                                     YES         YES


Notes:
  • No need for the auxiliary parameter value.
  • No loops.
  • We could add a check that we have no unmapped subregions. E.g. by:
    abort$(card(rs)<>card(s)) "There are unmapped subregions";
  • This will also catch cases where the casing does not match.
  • Critical note: GAMS has very poor support for strings. This should have been rectified a long time ago. There is no excuse for this deficiency. 

Conclusion


It is always worthwhile to revisit existing models and clean them up. During development, especially when under time pressure, we often are happy once our code works. As the lifetime of models can be surprisingly long, and thus new generations of users and modelers start working with old models, it is a good idea to do a cleanup round to make the code as readable as possible.  

Saturday, March 5, 2022

Bostock's command-line cartography tutorial under Windows

In [1], an excellent tutorial is presented about the process of making maps. It is a little bit dated, so here I develop some Windows-based scripts that make it possible to follow these tutorials. The goal is two-fold:

  1. Make things work for Windows. I am comfortable with the Unix command line, but, by far, most of my colleagues are not. To allow for easier sharing, I used Windows CMD.
  2. Update the commands so they all function again. Some URLs have changed a little bit, and we need to use a specific version of d3. If you want to use the original Unix commands, and you encounter some issues, you may want to check how I repaired them. 

The output format of these scripts is svg. SVG files are plain text, represent vectors (instead of bitmaps), and can be read directly by a standard web browser.

 

Part 1


The first thing to do is to install node.js if you don't have this available already. Using [2] this process is quite straightforward. Our first batch file will follow the steps in [1].


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
::---------------------------------------------------------------------------
::
::   Mike Bostock
::   Command-Line Cartography, Part 1
::   A tour of d3-geo’s new command-line interface.
:: 
::   https://medium.com/@mbostock/command-line-cartography-part-1-897aa8f8ca2c
::
::   Needs node.js. Install from: https://nodejs.org/en/download/
::
::   we translate UNIX command line to Windows CMD
::---------------------------------------------------------------------------

:: download shape file from Census Bureau (region 06 = California)
::
:: curl 'http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_06_tract_500k.zip' -o cb_2014_06_tract_500k.zip
:: use https instead, curl is part of windows
curl -O https://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_06_tract_500k.zip 

:: unzip -o cb_2014_06_tract_500k.zip
:: use tar instead (part of windows)
tar -xf cb_2014_06_tract_500k.zip

:: install node.js package
:: and extract geojson
call npm install -g shapefile
call shp2json cb_2014_06_tract_500k.shp -o ca.json

:: perform projection 
call npm install -g d3-geo-projection
call geoproject "d3.geoConicEqualArea().parallels([34, 40.5]).rotate([120, 0]).fitSize([960, 960], d)" < ca.json > ca-albers.json

:: create svg
call geo2svg -w 960 -h 960 < ca-albers.json > ca-albers.svg

:: launch browser
start ca-albers.svg


Notes:

  • The curl command is available on newer versions of Windows.
  • Windows does not have unzip, but it has tar which can unzip files.
  • Use https instead of HTTP.
  • The use of quotes is different under Windows compared to Unix. 

The result should be the following vector image displayed in your browser: