Monday, September 4, 2023

Critiquing a GAMS Model

It is always interesting to read GAMS models written by someone else. There are probably three things one can observe:

  • A nice formulation or concept that is useful to learn about.
  • A bad implementation: something that really should not be done that way.
  • A piece of code that is correct and defensible, but I would write it differently. This includes things like style, layout, formatting, etc.
My way of reading GAMS code is often to start editing and make it "my code". It is a bit slower process, but that comes with its advantages: better understanding of what is going on, and often cleaner code.

Here, I am looking at the model sambal.gms in the GAMS model library [1]. It is a very small model, but I have many thoughts about it. The complete model is reproduced in Appendix 1. Let's walk through it.

The matrix balancing problem is to find a nearby matrix such that row- and column sums are obeyed. A relative quadratic objective is used to minimize the sum of the squared deviations between the original data (the priors) and the final matrix. Zeros in the matrix need to be maintained: they can't become nonzero. This is sometimes called sparsity preservation. Often, sign-preservation is another condition. That is not part of this model. Note that, in this model, not only the matrix is updated but also the row and column totals. 

Data


The data entry and data prep step in sambal.gms is: 

Set i 'accounts' / lab, h1, h2, p1, p2 /;

 

Alias (i,j);

 

Table xb(i,j'original estimate'

         lab  h1   h2   p1  p2

   lab        15    3  130  80

   h1     na

   h2     na

   p1         15  130       20

   p2         25   40   55    ;

 

Parameter

   tb(i)   'original totals'   / lab 220, (h1,h2) na, p1 190, p2 105 /

   tw(i)   'wights for totals'

   xw(i,j'weights for cells';

 

tw(i)   = 1;

xw(i,j) = 1$xb(i,j);

tw(i)$(mapval(tb(i))     = mapval(na)) = 0;

xw(i,j)$(mapval(xb(i,j)) = mapval(na)) = 0;

 

display twxw;

 

The first thing to note is that the table has some special values: NA. I usually use capitals for this to make it stand out more. There are also some blank cells that are zero. In GAMS, sparse storage is used throughout. So the important rule to remember is "zero is the same as 'does not exist'". A cell with NA does exist (i.e., it is nonzero), but it has a special value. I assume xb means "x before".

The zero values in the matrix have a special meaning. They need to be preserved, i.e., the solution values for these should also be zero.

The table also has a rim with totals. In this model, they are specified separately. I would combine them into one. I.e.:

Table xb(*,*) 'original estimate'
lab h1 h2 p1 p2 total
lab 15 3 130 80 220
h1 na na
h2 na na
p1 15 130 20 190
p2 25 40 55 105
total 220 na na 190 105
;

This conveys better the underlying problem: our matrix needs to obey some row- and column-sum constraints. It is always better to use domain checking (which is turned off by using *), so I would suggest:

Set
iext 'extended:accounts+totals' / lab, h1, h2, p1, p2, total /
i(iext) 'accounts' / lab, h1, h2, p1, p2 /
;
Table xb(iext,iext) 'original estimate'
lab h1 h2 p1 p2 total
lab 15 3 130 80 220
h1 na na
h2 na na
p1 15 130 20 190
p2 25 40 55 105
total 220 na na 190 105
;

The blanks in the table (zeros) have a special meaning: zeros have to be preserved. We can fix the corresponding variables to zero, or, as is done in this model, we can skip these empty cells. Opposed to this, the NA values are to be included in the add-up constraints but skipped in the objective as we can't calculate a deviation from this. 

The typo in the explanatory text indicates we really need an IDE with spell-checking. My GAMS code often contains many comments, as I need to pass on code to other people who are not necessarily GAMS experts or even very familiar with the underlying models. So, I tend to be more verbose. Spell-checking can help there. 

I don't agree with how the assignments to tw and xw are written. Let me explain why.

The mapval(x) function is an exotic function that gives back a numerical value corresponding to the type of x:

0: is not a special value
4: is UNDF (undefined)
5: is NA (not available)
6: is INF ( )
7: is -INF ( )
8: is EPS
>8: is an acronym

I don't think I have ever used this function. There is almost never a good reason to use it in this way, as we can say:

tw(i)$(tb(i)=na) = 0;

xw(i,j)$(xb(i,j)=na) = 0;


which is both shorter and clearer. 

There are two audiences when you write code: one is the machine (in this case the GAMS compiler) and the other, more importantly, are human beings. It is very important to write code that is as readable as possible. In the fragment above this is obviously violated by obscuring the meaning of these assignments.  

The weights here are 0 or 1. They are modeled by a parameter. For boolean values, an arguably better representation is a set. A set element can only have two values (YES or NO), while a parameter can have any value. So I would probably write something like: 

set
ii(i) 'indicates if t(i) is included in the obj'
ij(i,j) 'indicates if x(i,j) is included in the obj'
;

ii(i) = tb(i)<>na;
ij(i,j) = xb(i,j)<>0 and xb(i,j)<>na;
display ii, ij;

Nerdy detail: the second assignment cannot be written as:

ij(i,j) = xb(i,j) and xb(i,j)<>na;

as the rhs resolves for na values to na and 0 which is na.

Model equations


Variable

   x(i,j)  'estimated cells'

   t(i)    'estimated totals'

   dev     'deviations';

 

Equation

   rbal(i'row balance'

   cbal(j) 'column balance'

   devsqr  'definition of square deviations';

 

rbal(i).. t(i) =e= sum(j$xb(i,j), x(i,j));

 

cbal(j).. t(j) =e= sum(i$xb(i,j), x(i,j));

 

devsqr..  dev  =e= sum((i,j)$xw(i,j), xw(i,j)*sqr(xb(i,j) - x(i,j))/xb(i,j))

                +  sum(i$tw(i), tw(i)*sqr(tb(i) - t(i))/tb(i));



The variables are (by default) free variables. That means x and t can become negative. Many matrix-balancing exercises want positive only results or at least sign preservation w.r.t to the priors.  For SAMs (Social Accounting Matrices) we don't want to introduce negative values, so I would make x and t positive variables.

The constraints rbal and cbal can be marginally simplified by using an extra set:

set nz(i,j) '(i,j) is nonzero: include in constraints';
nz(i,j) = xb(i,j);

rbal(i).. t(i) =e= sum(nz(i,j), x(i,j));
cbal(j).. t(j) =e= sum(nz(i,j), x(i,j));

I am a big fan of making equations simpler by precalculating sets. Arguably, the gain is here too small to be worthwhile.

The equation devsqr is much more problematic. The summation sum((i,j)$xw(i,j)xw(i,j)*sqr(.)/.)  has the weights xw duplicated: both as filter on the summation and also as a factor in the summand. The dollar condition says: if the weight exists (i.e. is nonzero) add the summand. Similarly for the second sum: sum(i$tw(i)tw(i)*sqr(.)/.). This duplication is not needed. We can simplify things by dropping either the dollar conditions or the factors. Remember that the weights are 0 or 1. So we can write:

devsqr.. dev =e= sum((i,j), xw(i,j)*sqr(xb(i,j) - x(i,j))/xb(i,j))
+ sum(i, tw(i)*sqr(tb(i) - t(i))/tb(i));

We can also do it the other way around. Using our new sets ii and ij and dropping the weight parameters, we can write:

devsqr.. dev =e= sum(ij, sqr(xb(ij) - x(ij))/xb(ij))
+ sum(ii, sqr(tb(ii) - t(ii))/tb(ii));

Note that this functional form assumes that the priors xb and tb are positive. We would be suddenly maximizing if they are negative. This can be repaired by replacing \[\min \sum_{i,j|\color{darkblue}{\mathit{xb}}_{i,j}>0} \frac{\left(\color{darkred}x_{i,j}-\color{darkblue}{\mathit{xb}}_{i,j}\right)^2}{\color{darkblue}{\mathit{xb}}_{i,j}}\] by [2]: \[\min \sum_{i,j|\color{darkblue}{\mathit{xb}}_{i,j}\ne 0} \frac{\left(\color{darkred}x_{i,j}-\color{darkblue}{\mathit{xb}}_{i,j}\right)^2}{| \color{darkblue}{\mathit{xb}}_{i,j} |}\] 
I often use a slightly different objective:\[\min \sum_{i,j|\color{darkblue}{\mathit{xb}}\ne 0} \left(\frac{\color{darkred}x_{i,j}-\color{darkblue}{\mathit{xb}}_{i,j}}{\color{darkblue}{\mathit{xb}}_{i,j}}\right)^2=\sum_{i,j|xb_{i,j}\ne 0} \left(\frac{\color{darkred}x_{i,j}}{\color{darkblue}{\mathit{xb}}_{i,j}}-1\right)^2\] This works correctly if there are negative entries. Note that SAMs (Social Accounting Matrices) usually have non-negative entries.

 

Model bal / all /;

 

x.l(i,j) = xb(i,j)$xw(i,j);

t.l(i)   = tb(i)$tw(i);

 

solve bal using nlp minimizing dev;

     
I can write the initial value assignments a bit more succinctly:

* initial values
x.l(ij) = xb(ij);
t.l(ii) = tb(ii);

Finally, it may be better to use a quadratic model type QCP (Quadratically Constrained Programming) instead of a general-purpose NLP. That will allow us to quadratic solvers like Cplex. The initial values are only useful for general-purpose NLP solvers (they will use them). Quadratic solvers typically don't benefit that much from initial values.

solve bal using qcp minimizing dev;

This is the end of the model. I have talked a lot about this model. Here, you see that models from the GAMS model library do not always follow best practices, and editing an existing model is a good way to walk your way through. Obviously, many edits can be a matter of taste, but here, I also made more substantial changes, and provided some rationale for that.

Conclusions


sambal.gms is a small model in the GAMS model library. Still, there are lots of things we can say about it. Obviously, some are more about taste and personal preferences, but some other concerns can be backed up with solid arguments.

References

  1. sambal.gms : Social Accounting Matrix Balancing Problem, https://www.gams.com/latest/gamslib_ml/libhtml/gamslib_sambal.html
  2. Zenios, S.A., Drud, A. and Mulvey, J.M. (1989), Balancing large social accounting matrices with nonlinear network programming. Networks, 19: 569-585. 

Appendix A: original sambal.gms

$title Social Accounting Matrix Balancing Problem (SAMBAL,SEQ=77)

 

$onText

A Social Accounting Matrix captures all the circular flows in an

economy and is called balanced if the row total equal the column

totals. A sample problem illustrates the use of Nonlinear Programming

to balance such matrices.

 

 

Zenios, S A, Drud, A S, and Mulvey, J, Balancing some large Social

Accounting Matrices with Nonlinear Programming. Tech. rep.,

Department of Civil Engineering, Princeton University, 1986.

 

Keywords: nonlinear programming, social accounting matrix, statistics

$offText

 

Set i 'accounts' / lab, h1, h2, p1, p2 /;

 

Alias (i,j);

 

Table xb(i,j'original estimate'

         lab  h1   h2   p1  p2

   lab        15    3  130  80

   h1     na

   h2     na

   p1         15  130       20

   p2         25   40   55    ;

 

Parameter

   tb(i)   'original totals'   / lab 220, (h1,h2) na, p1 190, p2 105 /

   tw(i)   'wights for totals'

   xw(i,j'weights for cells';

 

tw(i)   = 1;

xw(i,j) = 1$xb(i,j);

tw(i)$(mapval(tb(i))     = mapval(na)) = 0;

xw(i,j)$(mapval(xb(i,j)) = mapval(na)) = 0;

 

display twxw;

 

Variable

   x(i,j)  'estimated cells'

   t(i)    'estimated totals'

   dev     'deviations';

 

Equation

   rbal(i'row balance'

   cbal(j) 'column balance'

   devsqr  'definition of square deviations';

 

rbal(i).. t(i) =e= sum(j$xb(i,j), x(i,j));

 

cbal(j).. t(j) =e= sum(i$xb(i,j), x(i,j));

 

devsqr..  dev  =e= sum((i,j)$xw(i,j), xw(i,j)*sqr(xb(i,j) - x(i,j))/xb(i,j))

                +  sum(i$tw(i), tw(i)*sqr(tb(i) - t(i))/tb(i));

 

Model bal / all /;

 

x.l(i,j) = xb(i,j)$xw(i,j);

t.l(i)   = tb(i)$tw(i);

 

solve bal using nlp minimizing dev;


 

Appendix B: proposed model



$onText

 

   Matrix Balancing Problem

  

   Find a nearby matrix such that row and column totals are correct.

  

   Additional condition: preserve zeros, positive values.

  

   Slightly different relative quadratic objective

 

$offText

 

 

*---------------------------------------------------------------------

data entry

*---------------------------------------------------------------------

 

 

Sets

   iext  'accounts+totals' / lab, h1, h2, p1, p2, total /

   i(iext'accounts' / lab, h1, h2, p1, p2 / 

;

 

Alias (iext,jext),(i,j);

 

Table xb(iext,jext'original estimate'

         lab  h1   h2   p1  p2  total

   lab        15    3  130  80   220

   h1     na                      na

   h2     na                      na

   p1         15  130       20   190

   p2         25   40   55       105

   total 220  na   na  190 105    

;

 

totals should be symmetric

abort$sum(i$(xb(i,"total")<>xb("total",i)),1) "totals in XB not symmetric";

 

*---------------------------------------------------------------------

derived data

*---------------------------------------------------------------------

 

Parameter tb(i'totals';

tb(i) = xb(i,'total');

 

Sets

   ii(i)     'indicates if i is included in obj'

   ij(i,j)   'indicates if (i,j) is included in obj'

   nz(i,j)   '(i,j) is nonzero: include in constraints'

;

 

ii(i) = tb(i)<>na;

ij(i,j) = xb(i,j)<>0 and xb(i,j)<>na;

nz(i,j) = xb(i,j);

display ii, ijnz;

 

assumption: non-negative values

abort$sum(ij$(xb(ij)<0),1) "negative values in XB";

abort$sum(ii$(tb(ii)<0),1) "negative values in TB";

 

 

*---------------------------------------------------------------------

model

*---------------------------------------------------------------------

 

Positive variable

   x(i,j)  'estimated cells'

   t(i)    'estimated totals';

Variable

   dev     'deviations';

 

Equation

   rbal(i'row balance'

   cbal(j) 'column balance'

   devsqr  'definition of squared deviations';

 

rbal(i).. t(i) =e= sum(nz(i,j), x(i,j));

 

cbal(j).. t(j) =e= sum(nz(i,j), x(i,j));

 

this is a little bit different from the original model

* I tend to use this form

devsqr..  dev  =e= sum(ijsqr(x(ij)/xb(ij)-1))

                   + sum(ii, sqr(t(ii)/tb(ii)-1));

 

Model sambal /all/;

 

 

*---------------------------------------------------------------------

solve

*---------------------------------------------------------------------

 

initial values

x.l(ij) = xb(ij);

t.l(ii) = tb(ii);

 

solve sambal using qcp minimizing dev;

 

*---------------------------------------------------------------------

reporting

*---------------------------------------------------------------------

 

parameter sol(iext,jext'final estimate';

sol(i,j) = x.l(i,j);

sol(i,'total') = t.l(i);

sol('total',i) = t.l(i);

display xb,sol;

 

No comments:

Post a Comment