I was asked to for help in implementing a slight variation on the data input module discussed in this post. The change was to filter the input records. My first suggestion was:
* only these records will be copied to b
set subseti(*) /
OAF_PT_NT
OAF_PT_QC
/;loop((s1,s2,s3,s4,s5)$(a(s1,s2,s3,s4,s5) and subseti(s1)),
i(s1) = yes;
j(s5) = yes;
b(s1,s5) = a(s1,s2,s3,s4,s5);
);
This worked nicely on my small dataset but was totally disastrous on the large production data. The reason is that GAMS is suddenly reverting to “dense” processing where all possible combinations s1 × s2 × s3 × s4 × s5 are considered instead of only the ones for which parameter a() exist. If we use:
GAMS is doing the right thing. The Cartesian product of the index sets is exceptionally bad here because they are the universe set *. Withloop((s1,s2,s3,s4,s5)$a(s1,s2,s3,s4,s5),
GAMS should be able to do the same optimization but fails to do so. On a small data set this is not immediately visible, but on a large data set we will see very slow performance. A simple reformulation helps here:loop((s1,s2,s3,s4,s5)$(a(s1,s2,s3,s4,s5) and subseti(s1)),
loop((subseti,s2,s3,s4,s5)$a(subseti,s2,s3,s4,s5),
i(subseti) = yes;
j(s5) = yes;
b(subseti,s5) = a(subseti,s2,s3,s4,s5);
);
Now GAMS is fast again.
Other variations of the loop are:
loop((s1,s2,s3,s4,s5)$(a(s1,s2,s3,s4,s5) * subseti(s1)),
i(s1) = yes;
j(s5) = yes;
b(s1,s5) = a(s1,s2,s3,s4,s5);
);
and
loop((s1,s2,s3,s4,s5)$(a(s1,s2,s3,s4,s5)$subseti(s1)),
i(s1) = yes;
j(s5) = yes;
b(s1,s5) = a(s1,s2,s3,s4,s5);
);
These are all fast. I was successful in advising to use the slowest variant!
On the real data the timings are:
loop | time |
---|---|
loop over all records | |
loop((s1,s2,s3,s4,s5)$a(s1,s2,s3,s4,s5), | 37.815 SECS |
loop over records in subset | |
loop((s1,s2,s3,s4,s5)$(a(s1,s2,s3,s4,s5) and subseti(s1)), | Infinity: stopped after 10 minutes. Client ran this for a whole night without results |
loop((subseti,s2,s3,s4,s5)$a(subseti,s2,s3,s4,s5), | 0.000 SECS |
loop((s1,s2,s3,s4,s5)$(a(s1,s2,s3,s4,s5)*subseti(s1)), | 0.296 SECS |
loop((s1,s2,s3,s4,s5)$(a(s1,s2,s3,s4,s5)$subseti(s1)), | 0.218 SECS |
Update: this behavior is related to parameter A containing special values (it had some NA’s). Because of this the operators AND and * have different behavior (not exactly sure why).
No comments:
Post a Comment