## Friday, October 20, 2023

### GAMS: SMAX and sparsity

This is a discussion about the SMAX function in GAMS and how it behaves for sparse data.

The data structure we were facing was something like:

set
i 'cases' /case1*case100000/
j 'attribute' /j1*j25/
k 'attribute' /k1*k25/
t 'type' /typ1*typ2/
;

parameter p(i,j,k,t) 'positive numbers';
* note: for each i we have only one (j,k)

The data for parameter $$\color{darkblue}p_{i,j,k,t}$$ is originating from a database. A small (simulated) data set can look like:

---- 20 PARAMETER p positive numbers

typ1 typ2

case1 .j5 .k22 0.550 0.301
case2 .j8 .k6 0.350 0.856
case3 .j2 .k13 0.998 0.579
case4 .j25.k20 0.131 0.640
case5 .j4 .k7 0.669 0.435
case6 .j9 .k9 0.131 0.150
case7 .j15.k21 0.231 0.666
case8 .j20.k8 0.110 0.502
case9 .j5 .k22 0.265 0.286
case10.j15.k19 0.628 0.464

We see that for each case $$i$$, we have exactly one $$j$$ and $$k$$.

### First slow SMAX implementation

We want to calculate:

parameter q(i,j,k) 'smax(t, p(i,j,k,t))';
q(i,j,k) = smax(t,p(i,j,k,t));

This obviously gives:

---- 51 PARAMETER q smax(t, p(i,j,k,t))

case1 .j5 .k22 0.550
case2 .j8 .k6 0.856
case3 .j2 .k13 0.998
case4 .j25.k20 0.640
case5 .j4 .k7 0.669
case6 .j9 .k9 0.150
case7 .j15.k21 0.666
case8 .j20.k8 0.502
case9 .j5 .k22 0.286
case10.j15.k19 0.628

Although this calculation is correct, it is painfully slow for large data. The reason is that parameter $$\color{darkblue}p$$  is very sparse: for every $$i$$ we have one $$(j,k)$$. It is noted that GAMS uses sparse storage of data: i.e. "does not exist and being zero is the same". In GAMS, the sum function is very fast when running over sparse data: we can safely skip the zero values. However, when calculating the smax function, we can't skip safely all zeros. It will evaluate the maximum over all possible combinations $$(i,j,k)$$ including ones we have no data for. When using 100,000 cases this smax calculation takes 7.2 seconds.

For $$|I|=10$$ cases, we have $$|I|\cdot|J|\cdot|K| = 10\cdot25\cdot25 = 6,250$$. For $$|I|=100,000$$ this becomes 62.5 million. These are large numbers!

### Faster alternatives: exploit sparsity

A fast version will protect us from running over the Cartesian product $$I\times J\times K$$. A first version is:

parameter q2(i,j,k);
q2(i,j,k)$sum(t,p(i,j,k,t)) = smax(t,p(i,j,k,t)); This assumes $$\color{darkblue}p\ge 0$$. The$sum construct says: only do the assignment if one or more of the $$\color{darkblue}p_{i,j,k,t}$$ is non-zero. This version does not work always correctly if negative $$\color{darkblue}p$$ are allowed. For instance $$-2+2=0$$, so a sum of zero does not imply that each value is zero. In the case where we allow negative values, we can do something like:

parameter q3(i,j,k);
q3(i,j,k)$sum(t$p(i,j,k,t),1) = smax(t,p(i,j,k,t));

This version explicitly counts the number of nonzero elements $$\color{darkblue}p_{i,j,k,t}$$ over $$t$$. The profiling information

--- Profile Summary (6 records processed)
7.187   0.034GB        27 Assignment q (100000)
0.047   0.043GB        36 Assignment q2 (100000)
0.031   0.051GB        42 Assignment q3 (100000)

indicates that our fast versions are more than 100 times as fast (the first column contains the timings).  The difference in timings between versions q2 and q3 is just noise.

My preferred approach is to explicitly store the sparsity pattern of the $$(i,j,k)$$ tuples into a separate set. We can do:

set ijk(i,j,k);