A famous result in graph theory is that we can color a planar graph with just four colors [1]. The MIP model for coloring a graph is really simple. Let's try this with a relatively large data set: the map of US counties. We'll see that there are some hurdles on the way.
Building block: MIP model for coloring a graph
The problem is as follows: we want to assign a color to each node such that nodes connected by an arc don't have the same color. The objective is to minimize the number of different colors. For the MIP model we use an assignment variable: \[\color{darkred}x_{n,c} = \begin{cases}1 & \text{if node $n$ gets color $c$} \\ 0 & \text{otherwise}\end{cases}\] To keep track of the colors we have: \[\color{darkred}u_c = \begin{cases}1 & \text{color $c$ is used for some nodes} \\ 0 & \text{otherwise}\end{cases}\]
MIP Model |
---|
\[\begin{align} \min & \sum_c \color{darkred}u_c \\ & \sum_c \color{darkred}x_{n,c}=1 &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le \color{darkred}u_c && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \color{darkred}u_c \le \color{darkred}u_{c-1} && \forall c\gt 1 \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & \color{darkred}u_c \in \{0,1\} \end{align} \] |
The last constraint is optional. It is used to get nicer-looking solutions and to reduce symmetry.
With a random graph with 20 nodes and 57 arcs, our minimum number of colors is 4. Of course, other graphs may need more or fewer colors.
Building block: Map of US counties
To produce a map for all US counties, I used the GeoJSON file from [2]. After extracting the id fields from the JSON file we have a collection of counties encoded by their FIPS code [3]. These 5-digit codes have two parts: a 2-digit state id and a 3-digit county id. We have 3,221 counties in this file. This will become our set of nodes.
The data for the adjacency matrix comes from [4]. I had some issues with this file:
- This file had more counties than I have in my map data. All counties not in the map file are ignored. These were all related to faraway territories: Virgin Islands, American Samoa, Guam, Northern Mariana Islands.
- The matrix in [4] is larger than strictly needed. It says all counties are adjacent to itself. Also, I made the matrix strictly upper triangular. I.e. two counties \(i\) and \(j\) can only be adjacent if \(i\lt j\). This makes the data smaller without losing any information.
- There is a problem with county 27111. It has different names in the file.
- County 27165 has an empty name in some of the records.
- I suspect the fips codes are correct but the names are not. Because of the inconsistencies, I have the feeling that some of this data was entered manually instead of using automation. Copy-paste should be avoided like the plague!
- Data problems are fact of life. Even for data from reputable sources. Always check incoming data for errors, and add data checks to the model.
Debugging
MODEL STATISTICSBLOCKS OF EQUATIONS 3 SINGLE EQUATIONS 60,102BLOCKS OF VARIABLES 3 SINGLE VARIABLES 19,333NON ZERO ELEMENTS 189,973 DISCRETE VARIABLES 19,332
The size depends quite a bit on the number of colors in set \(c\). The solution is however not what we would expect:
**** SOLVER STATUS 1 Normal Completion
**** MODEL STATUS 1 Optimal
**** OBJECTIVE VALUE 5.0000
Why don't we get an objective value of 4? We need to do some debugging. Let's see if we can identify an offending node. I.e., when we remove that node (county), the remaining map can be properly colored with 4 colors. Our debugging tool becomes the following model:
MIP Debugging Model |
---|
\[\begin{align} \min\> & \color{darkred}z=\sum_c \color{darkred}u_c \\ &\color{darkred}z=4 \\ & \sum_c \color{darkred}x_{n,c}=\color{darkred}v_n &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le \color{darkred}u_c && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \sum_n \color{darkred}v_n = {\bf{card}}(n)-1 \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & \color{darkred}u_c \in \{0,1\} \\ & \color{darkred}v_n \in \{0,1\} \end{align} \] |
The new variable \(\color{darkred}v_n\) indicates if node \(n\) is used. The node with \(\color{darkred}v_n=0\) in the solution is suspect. When we run this, Glades County in Florida is selected to be ignored. When we fix Glades county to be included, neighboring Okeechobee County is removed. So let's have a look at this part of Florida:
We see that Glades, Okeechobee, Martin, Palm Beach, and Hendry form a bit of a pie chart with one single point where they meet. Such a pie chart with a single meeting point can create problems:
There is no way we can color this with 4 colors. If we look at a description of the 4-color theorem, we actually see such a single point is removed from consideration:
The four-color theorem states that any map in a plane can be colored using four-colors in such a way that regions sharing a common boundary (other than a single point) do not share the same color. [5]
After removing arcs Glades-Martin and Glades-Palm Beach, our model finds a 4-coloring scheme for the US county map. The problem took about 3.5 minutes on my machine to solve to optimality. The problem is large but the underlying graph is very sparse.
The map
Refinements
---- 6243 SET isolated2 county without neighbors (incl. name) 15001.Hawaii County, HI 15003.Honolulu County, HI 15007.Kauai County, HI
Conclusions
- Data can never be trusted, even if it comes from the Census bureau.
- Models can help in debugging by changing the question. Here we asked: is there a single node (county) we can remove to make the map four-colorable? This model led us directly to the underlying problem.
- Touching a neighboring county by a single point does not count in the 4-color theorem. It looks like these cases can happen in the real world.
References
- Four-color theorem, https://en.wikipedia.org/wiki/Four_color_theorem
- https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json
- FIPS county codes, https://en.wikipedia.org/wiki/FIPS_county_code
- https://www2.census.gov/geo/docs/reference/county_adjacency.txt
- Four-color theorem, https://mathworld.wolfram.com/Four-ColorTheorem.html
- data.inc, https://raw.githubusercontent.com/aopt/OptimizationModels/main/GraphColoring/data.inc
Appendix: GAMS Model
$ontext |
love the debugging model.
ReplyDelete