In [1], I discussed coloring all US counties (3,221 of them) such that neighboring counties have a different color. The famous 4-color theorem says we can do this with just 4 colors. After some adventures with the data, I was indeed able to produce maps with 4 colors.
|
Model 1: US Counties colored with minimum number of colors |
An interesting part is the color count table. That table shows the number of counties colored by each color. This is a bit unbalanced. So an obvious question is:
Can we find a 4-coloring that has a more equal use of colors?
To recap, the original graph coloring model looked like:
Model 1: Minimize number of colors model |
\[\begin{align} \min & \sum_c \color{darkred}u_c \\ & \sum_c \color{darkred}x_{n,c}=1 &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le \color{darkred}u_c && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \color{darkred}u_c \le \color{darkred}u_{c-1} && \forall c\gt 1 \\ & \color{darkred}x_{n,\color{darkgreen}{\mathit{color1}}} = 1 && \text{for all isolated counties}\\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & \color{darkred}u_c \in \{0,1\} \end{align} \] |
This model delivered 4 as the optimal solution. Now we want to minimize the spread in the color count. We can reuse the original model, and convert the objective into a constraint: \[\sum_c \color{darkred}u_c=4\] but another, probably simpler approach is just to restrict the set \(c\) of colors to just 4 elements. With this, a model can be:
Model 2: Minimize spread in color count |
---|
\[\begin{align} \min\> & \color{darkred}{\mathit{range}} = \color{darkred}{\mathit{maxCount}} -\color{darkred}{\mathit{minCount}}\\ &\color{darkred}{\mathit{maxCount}} \ge \color{darkred}{\mathit{count}}_c && \forall c\\ &\color{darkred}{\mathit{minCount}} \le \color{darkred}{\mathit{count}}_c && \forall c\\ &\color{darkred}{\mathit{count}}_c = \sum_n\color{darkred}x_{n,c} && \forall c\\ & \sum_c \color{darkred}x_{n,c}=1 &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le 1 && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & c = \{\color{darkgreen}{col1},\color{darkgreen}{col2},\color{darkgreen}{col3},\color{darkgreen}{col4}\} \end{align} \] |
Note that:
- The variable \(\color{darkred}u_c\) is dropped. We know it is 1 if set \(c\) has just 4 colors.
- Isolated counties now become interesting again: they play a role. We no longer fix them to have color 1.
Symmetry breaking
We can help things a little bit along by fixing one county. I chose the first county, Autauga County, AL (fips code: 01001), and fixed it to have color 1. Now we can look at a neighbor and fix it to color 2.
|
Alabama counties |
I chose Dallas County for that. Then Chilton has to have color 3 or 4. I fixed it to color 3. So altogether, I fixed 3 assignments.
When I run this model, I see:
|
Model 2: Minimize spread in color count |
Obviously, we cannot do better than this.
A variant of this model would be to use the total area occupied by each color, and make those as equal as possible. Data for this is readily available.
Note: I used d3 [2] for visualization here.
The worst spread
One would think that finding the worst spread: one color dominating and another assigned to very few countries, would be just a matter of replacing minimize by maximize. Unfortunately, this is not the case. Finding the max and the min count is now more difficult: we need extra binary variables. The idea is to use the following formulation: \[\color{darkred}y=\max_i \color{darkred}x_i \iff \begin{aligned} & \forall i: \color{darkred}y\ge \color{darkred}x_i\\ & \exists i: \color{darkred}y\le \color{darkred}x_i\end{aligned} \iff \begin{aligned} & \color{darkred}y\ge \color{darkred}x_i &&\forall i \\ & \color{darkred}y\le \color{darkred}x_i - \color{darkblue}M (1-\color{darkred}\delta_i) &&\forall i \\ & \sum_i \color{darkred}\delta_i=1 \\ & \color{darkred}\delta_i \in \{0,1\}\end{aligned} \]
Based on this is my first model:
Model 3a: Maximize spread in color count |
---|
\[\begin{align} \max\> & \color{darkred}{\mathit{range}} = \color{darkred}{\mathit{maxCount}} -\color{darkred}{\mathit{minCount}}\\ &\color{darkred}{\mathit{maxCount}} \ge \color{darkred}{\mathit{count}}_c && \forall c \\ &\color{darkred}{\mathit{maxCount}} \le \color{darkred}{\mathit{count}}_c + \color{dakblue}M \color{darkred}\delta^{\mathit{max}}_c&&\forall c \\ & \sum_c \color{darkred}\delta^{\mathit{max}}_c = {\bf{card}}(c)-1 \\ &\color{darkred}{\mathit{minCount}} \le \color{darkred}{\mathit{count}}_c && \forall c \\ &\color{darkred}{\mathit{minCount}} \ge \color{darkred}{\mathit{count}}_c - \color{dakblue}M \color{darkred}\delta^{\mathit{min}}_c&&\forall c \\ & \sum_c \color{darkred}\delta^{\mathit{min}}_c = {\bf{card}}(c)-1 \\ &\color{darkred}{\mathit{count}}_c = \sum_n\color{darkred}x_{n,c} && \forall c\\ & \sum_c \color{darkred}x_{n,c}=1 &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le 1 && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & \color{darkred}\delta^{\mathit{min}}_c, \color{darkred}\delta^{\mathit{max}}_c \in \{0,1\} \\ & c = \{\color{darkgreen}{col1},\color{darkgreen}{col2},\color{darkgreen}{col3},\color{darkgreen}{col4}\} \end{align} \] |
Notes:
- We can set \(\color{darkblue}M := {\bf{card}}(c)\).
- We can use the solution of model 1 as incumbent using the mipstart option.
- This is a very difficult model to solve.
A much better approach is to introduce a symmetry-breaking ordering constraint \[\color{darkred}{\mathit{count}}_c \le \color{darkred}{\mathit{count}}_{c-1}\] and use this to form the objective: \[\max\>\> \color{darkred}{\mathit{range}} = \color{darkred}{\mathit{count}}_{\color{darkgreen}{\mathit{col1}}} - \color{darkred}{\mathit{count}}_{\color{darkgreen}{\mathit{col4}}}\] This leads to the simpler model:
Model 3b: Maximize spread in color count |
---|
\[\begin{align} \max\> & \color{darkred}{\mathit{range}} = \color{darkred}{\mathit{count}}_{\color{darkgreen}{\mathit{col1}}} - \color{darkred}{\mathit{count}}_{\color{darkgreen}{\mathit{col4}}}\\ &\color{darkred}{\mathit{count}}_c = \sum_n\color{darkred}x_{n,c} && \forall c\\ & \sum_c \color{darkred}x_{n,c}=1 &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le 1 && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ &\color{darkred}{\mathit{count}}_c \le \color{darkred}{\mathit{count}}_{c-1} && \forall c \gt 1 \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & c = \{\color{darkgreen}{col1},\color{darkgreen}{col2},\color{darkgreen}{col3},\color{darkgreen}{col4}\} \end{align} \] |
Although better than model 3a, this is still a very difficult model to solve. After 2 hours there is still a very large gap, so not even close to proving optimality.
|
Max spread model (time limit) |
Conclusion
A 4-colored map has enough degrees of freedom that we can meaningfully solve for different objectives. Here we minimized and maximized the spread in the number of counties covered by each color.
The max-spread model is both more challenging to model as well to solve.
References
- http://yetanothermathprogrammingconsultant.blogspot.com/2022/01/coloring-us-county-map.html
- Data-Driven Documents, https://d3js.org/.
Appendix: GAMS Model 2
$ontext
Minimize
spread in counties per color
Color
map with US counties
Files:
ColorCounties.gms
(this file)
data.inc
geojson-counties-fips.js
$offtext
*---------------------------------------------------------------
* data
*---------------------------------------------------------------
sets
n 'nodes: counties (fips code)'
a(n,n) 'arcs: adjacency --
upper-triangular part only'
c 'colors' /
col1
'#96CEB4'
col2
'#FFEEAD'
col3
'#D9534F'
col4
'#FFAD60'
/
county 'county names'
map(n,county<)
;
$include data.inc
* fix data wrt Glades County, FL
* set one of these to "no"
a('12043','12085') = no;
*a('12043','12099') = no;
scalar numCounties 'number of counties in map';
numCounties = card(n);
display numCounties;
alias(n,i,j);
* we have a few isolated counties (Hawaii)
set isolated(n) 'county without neighbors';
isolated(n) = sum(j$(a(n,j) or a(j,n)),1)=0;
display isolated;
*---------------------------------------------------------------
* data checks
*---------------------------------------------------------------
abort$(card(map)<>numCounties) "check map";
abort$sum(a(i,j)$(ord(i)>=ord(j)),1) "a(i,j) is not strictly upper-triangular";
*---------------------------------------------------------------
* optimization model: minimize number of colors needed
*---------------------------------------------------------------
binary variables
x(n,c) 'assign color to node'
;
variable
numColors 'number of colors needed'
range
maxCount
minCount
Count(c) 'number of counties colored with c'
;
equations
objective 'make number of times each color is
used as equal as possible'
assign(n) 'every node should have exactly one
color'
notSameColor(i,j,c) 'adjacent vertices cannot have the same color'
boundMax
boundMin
counting
;
objective.. range =e= maxCount-minCount;
boundMax(c).. maxCount =g= count(c);
boundMin(c).. minCount =l= count(c);
counting(c).. Count(c) =e= sum(n, x(n,c));
assign(n).. sum(c, x(n,c)) =e= 1;
notSameColor(a(i,j),c).. x(i,c)+x(j,c) =l= 1;
*---------------------------------------------------------------
* fix a few counties by looking at the map
* 01001 .
"Autauga County, AL" ->
color 1
* neigbor is: Dallas County:
* 01047 .
"Dallas County, AL" ->
color 2
* a neigbor to both is: Chilton
* 01021 .
"Chilton County, AL" ->
color 3
*---------------------------------------------------------------
x.fx('01001','col1') = 1;
x.fx('01047','col2') = 1;
x.fx('01021','col3') = 1;
model color /all/;
option optcr=0,
threads=16;
solve color
minimizing range using mip;
display x.l;
parameter colcount(c) 'number of counties with
given color';
colcount(c) = sum(n$(x.l(n,c)>0.5), 1);
display colcount;
|
No comments:
Post a Comment