In http://yetanothermathprogrammingconsultant.blogspot.com/2015/03/k-means-clustering-formulated-as-miqcp.html we were quite underwhelmed by the performance of MIQCPs (Mixed Integer Quadratically Constrained Problems).
A standard heuristic is easily formulated in GAMS:
The results show it is important to use different starting configurations (i.e. multiple trials):
---- 100 PARAMETER trace reporting
iter1 iter2 iter3 iter4 iter5 iter6
trial1 366.178 28.135 14.566
+ iter7 iter8 iter9 iter10
trial8 67.933 42.621 14.881 14.566
The consensus is that 14.566 is the optimal sum of the squared distances. Note that this was an extremely easy problem:
- We use an xor operator to find the difference between if the sets ik_prev and ik. If there is no difference card(ik_diff)=0 and thus notConverged=0. See below for a small example illustrating the xor.
- The assignment
ik(i,k) = yes$(dclose(i) = d(i,k));
is incorrect if point i is exactly between two clusters. In that case we assign the point to two clusters. We can fix this by a more complicated loop:
ik(i,k) = no;
loop((i,k)$(dclose(i) = d(i,k)),
ik(i,k)$(sum(ik(i,kk),1)=0) = yes;
- The construct
cluster(i) = uniformint(1,numk); ik(i,k) = cluster(i)=ord(k);
cannot be simplified to
ik(i,k) = uniformint(1,numk)=ord(k);.
The latter version is incorrect.
- The assignment
c(k,xy)$n(k) = sum(ik(i,k), p(i,xy))/n(k);
will keep a cluster with zero points at its current location.
The XOR operator
This fragment illustrates a few different ways to calculate the difference between two sets:
k(i) / b,c/
diff(i) 'difference between sets: card(diff)=0 if equal'
diff(i) = (j(i)-k(i)) + (k(i)-j(i));
diff(i) = j(i) xor k(i);
diff(i) = 1$j(i) <> 1$k(i);
---- 9 SET diff difference between sets: card(diff)=0 if equal
---- 12 SET diff difference between sets: card(diff)=0 if equal
---- 15 SET diff difference between sets: card(diff)=0 if equal
My preferred syntax diff(i)=j(i)<>k(i) is not allowed. In my opinion, notation should really try to help readability, and not show off mathematical prowess.