In http://yetanothermathprogrammingconsultant.blogspot.com/2015/03/k-means-clustering-formulated-as-miqcp.html we were quite underwhelmed by the performance of MIQCPs (Mixed Integer Quadratically Constrained Problems).
A standard heuristic is easily formulated in GAMS:
$ontext |
The results show it is important to use different starting configurations (i.e. multiple trials):
---- 100 PARAMETER trace reporting iter1 iter2 iter3 iter4 iter5 iter6 trial1 366.178 28.135 14.566 + iter7 iter8 iter9 iter10 trial8 67.933 42.621 14.881 14.566 |
The consensus is that 14.566 is the optimal sum of the squared distances. Note that this was an extremely easy problem:
Notes:
- We use an xor operator to find the difference between if the sets ik_prev and ik. If there is no difference card(ik_diff)=0 and thus notConverged=0. See below for a small example illustrating the xor.
- The assignment
ik(i,k) = yes$(dclose(i) = d(i,k));
is incorrect if point i is exactly between two clusters. In that case we assign the point to two clusters. We can fix this by a more complicated loop:
ik(i,k) = no;
loop((i,k)$(dclose(i) = d(i,k)),
ik(i,k)$(sum(ik(i,kk),1)=0) = yes;
);
- The construct
cluster(i) = uniformint(1,numk); ik(i,k) = cluster(i)=ord(k);
cannot be simplified to
ik(i,k) = uniformint(1,numk)=ord(k);.
The latter version is incorrect. - The assignment
c(k,xy)$n(k) = sum(ik(i,k), p(i,xy))/n(k);
will keep a cluster with zero points at its current location.
The XOR operator
This fragment illustrates a few different ways to calculate the difference between two sets:
set i /a*z/ j(i) /a,b/ k(i) / b,c/ diff(i) 'difference between sets: card(diff)=0 if equal' ; diff(i) = (j(i)-k(i)) + (k(i)-j(i)); display diff; diff(i) = j(i) xor k(i); display diff; diff(i) = 1$j(i) <> 1$k(i); display diff; |
---- 9 SET diff difference between sets: card(diff)=0 if equal a, c ---- 12 SET diff difference between sets: card(diff)=0 if equal a, c ---- 15 SET diff difference between sets: card(diff)=0 if equal a, c |
My preferred syntax diff(i)=j(i)<>k(i) is not allowed. In my opinion, notation should really try to help readability, and not show off mathematical prowess.
No comments:
Post a Comment