Monday, March 9, 2009

Sparse data in MSF OML

The modeling language of the Microsoft Solver Foundation does not support sparse data. I.e. if you have a parameter a[i,j] you need to have values for all a[i,j]’s and you cannot skip the zero’s. In some cases this is a little bit of problem. Adding all zero records may be a lot of work and may take a large amount of space in your spreadsheet. Sometimes it is possible to work around this problem by adding extra sets and columns. Probably the easiest way to explain this is with a small example. A network optimization problem is a good example where a sparse data structure can help. In this case we have a max flow model with a small but sparse network. In GAMS we can write:


max flow network example

Data from example in
Mitsuo Gen, Runwei Cheng, Lin Lin
Network Models and Optimization: Multiobjective Genetic Algorithm Approach
Springer, 2008

Erwin Kalvelagen, Amsterdam Optimization, May 2008


i 'nodes' /node1*node11/
source(i) /node1/
sink(i) /node11/


parameter capacity(i,j) /
node1.node2 60
node1.node3 60
node1.node4 60
node2.node3 30
node2.node5 40
node2.node6 30
node3.node4 30
node3.node6 50
node3.node7 30
node4.node7 40
node5.node8 60
node6.node5 20
node6.node8 30
node6.node9 40
node6.node10 30
node7.node6 20
node7.node10 40
node8.node9 30
node8.node11 60
node9.node10 30
node9.node11 50
node10.node11 50

set arcs(i,j);
arcs(i,j)$capacity(i,j) = yes;
display arcs;

parameter rhs(i);
rhs(source) = -1;
rhs(sink) = 1;

x(i,j) 'flow along arcs'
f 'total flow'

positive variables x;
x.up(i,j) = capacity(i,j);

flowbal(i) 'flow balance'

flowbal(i).. sum(arcs(j,i), x(j,i)) - sum(arcs(i,j), x(i,j)) =e= f*rhs(i);

model m/flowbal/;

solve m maximizing f using lp;

The parameter capacity defines the network: where there is no capacity, we assume there no arc. We can calculate two-dimensional set arcs(i,j) to indicate which arcs exist. This can be used to simplify the flow-balance equations. Note that in GAMS only the variables that appear in the equations are actually part of the model. I.e. although we declare x(i,j), in the equations we only have x(arcs). This means we only have 22 variables x(i,j).

In OML we cannot use a sparse table for the capacities: we would need a complete table with 11x11=121 records. That would also mean that we have 121 variables. The trick is to have an explicit set arcs in the data table. As OML does not have 2-dimensional sets, I used parameters From[.], To[.] to describe the network. The complete data is below:

The OML model can now look like:

The node balance equation becomes a little bit unwieldy and less readable with this approach. This is a small, artificial example but note that quite a few models have some network component. It illustrates that leaving out sparse data handling and multidimensional sets to simplify a modeling language, although not a show stopper, comes with a cost in the form of additional complexity for the modeler. Note that in this model we still have a possibly large node table to maintain (the GAMS model handles this sparse).