Saturday, June 1, 2013

Simple Data Set to show equivalence of RAS algorithm and Entropy model

For the matrix balancing problem it is well known that the RAS algorithm and an Entropy model give the same results. Here is a small example with the data taken from http://www.lexjansen.com/mwsug/1992/MWSUG92013.pdf.

In the models below the row and column totals are known while the inner part of the matrix is estimated. The goal is to find a nearby matrix such that the row and column sums are equal to the given row and column totals. For many economic problems it is important we preserve the zeros.

The RAS algorithm implementation can easily be improved by iterating until convergence is observed instead of the fixed number of iterations. One big advantage of using an Entropy model instead of an algorithm is that it is easy at add side-constraints. This can often be very important in practical situations.

RAS Algorithm

$ontext

  
Example from

  
Using PROC IML to do Matrix Balancing
  
Carol Alderman, University of Kansas
  
Institute for Public Policy and Business Research
  
MidWest SAS Users Group MWSUG 1992

$offtext


sets
   p
'products' /pA*pI/
   s
'salesmen' /s1*s10/
;

table A0(*,*) 'estimated matrix, known totals'

           
s1   s2   s3   s4   s5   s6   s7   s8   s9  s10   rowTotal
     
pA   230  375  375  100    0  685  215    0   50    0       2029
     
pB   330  405  419  175   90  504  515    0  240  105       2798
     
pC   268  225  242    0   30  790  301   44  100    0       1998
     
pD   595  380  638  275   30  685  605   88  100  160       3566
     
pE   340  360  440  200   30  755  475   44  150    0       2794
     
pF   132  190  200    0    0  432  130    0    0    0       1071
     
pG   309  330  350  125    0  612  474    0   50   50       2305
     
pH   365  400  330  150   50  575  600   44  150  110       2747
     
pI   210  250  308  125    0  720  256    0  100   50       2015

colTotal  2772 2910 3300 1150  240 5760 3526  220  950  495
;


A0(p,
'sum') = sum(s,A0(p,s));
A0(
'sum',s) = sum
(p,A0(p,s));
A0(p,
'diff') = A0(p,'sum') - A0(p,'rowTotal'
);
A0(
'diff',s) = A0('sum',s) - A0('colTotal'
,s);
option
A0:0;
display
A0;

alias
(p,i);
alias
(s,j);


parameters

   A(*,*) 
'new RASsed matrix'
   u(i)   
'row total'
   v(j)   
'column total'
   rho(i)  
'used inside RAS algorithm'
   sigma(j)
'used inside RAS algorithm'
;

A(i,j) = A0(i,j);
u(i) = A0(i,
'rowTotal');
v(j) = A0(
'colTotal'
,j);

abort$(smin((i,j),A(i,j))<0) "Entries A(i,j) should be nonnegative"
,A;
abort$(smin(i,u(i))<0) "Entries u(i) should be nonnegative"
,u;
abort$(smin(j,v(j))<0) "Entries v(j) should be nonnegative"
,v;


set iter 'iterations of RAS algorithm' /iter1*iter10/
;

loop
(iter,

*

* step 1 row scaling
*
   rho(i) = u(i)/
sum(j,A(i,j));
   A(i,j) = rho(i)*A(i,j);


*

* step 2 column scaling
*
   sigma(j) = v(j)/
sum(i,A(i,j));
   A(i,j) =  A(i,j)*sigma(j);
);

A(i,
'rowTotal'
) = u(i);
A(
'colTotal'
,j) = v(j);

A(p,
'sum') = sum
(s,A(p,s));
A(
'sum',s) = sum
(p,A(p,s));
A(p,
'diff') = A(p,'sum') - A(p,'rowTotal'
);
A(
'diff',s) = A('sum',s) - A('colTotal'
,s);
display "-------------------------------------------------------------------"
,
       
"Results"
,
       
"-------------------------------------------------------------------"
,
        A;


Entropy Model

$ontext

  
Example from

  
Using PROC IML to do Matrix Balancing
  
Carol Alderman, University of Kansas
  
Institute for Public Policy and Business Research
  
MidWest SAS Users Group MWSUG 1992

$offtext


sets
   p
'products' /pA*pI/
   s
'salesmen' /s1*s10/
;

table A0(*,*) 'estimated matrix, known totals'

           
s1   s2   s3   s4   s5   s6   s7   s8   s9  s10   rowTotal
     
pA   230  375  375  100    0  685  215    0   50    0       2029
     
pB   330  405  419  175   90  504  515    0  240  105       2798
     
pC   268  225  242    0   30  790  301   44  100    0       1998
     
pD   595  380  638  275   30  685  605   88  100  160       3566
     
pE   340  360  440  200   30  755  475   44  150    0       2794
     
pF   132  190  200    0    0  432  130    0    0    0       1071
     
pG   309  330  350  125    0  612  474    0   50   50       2305
     
pH   365  400  330  150   50  575  600   44  150  110       2747
     
pI   210  250  308  125    0  720  256    0  100   50       2015

colTotal  2772 2910 3300 1150  240 5760 3526  220  950  495
;


A0(p,
'sum') = sum(s,A0(p,s));
A0(
'sum',s) = sum
(p,A0(p,s));
A0(p,
'diff') = A0(p,'sum') - A0(p,'rowTotal'
);
A0(
'diff',s) = A0('sum',s) - A0('colTotal'
,s);
option
A0:0;
display
A0;

alias
(p,i);
alias
(s,j);

variables
A(i,j),z;

equations

   objective
   rowsum(i)
   colsum(j)
;

objective.. z =e=
sum((i,j)$A0(i,j), A(i,j)*log(A(i,j)/A0(i,j)));
rowsum(i)..
sum(j$A0(i,j), A(i,j)) =e= A0(i,'rowTotal'
);
colsum(j)..
sum(i$A0(i,j), A(i,j)) =e= A0('colTotal'
,j);

A.L(i,j) = A0(i,j);
A.lo(i,j)$A0(i,j) = 0.0001;


model m /all/
;
solve
m minimizing z using nlp;

parameter
A1(*,*);
A1(i,j) = A.L(i,j);

A1(i,
'rowTotal') = A0(i,'rowTotal'
);
A1(
'colTotal',j) = A0('colTotal'
,j);

A1(p,
'sum') = sum
(s,A1(p,s));
A1(
'sum',s) = sum
(p,A1(p,s));
A1(p,
'diff') = A1(p,'sum') - A1(p,'rowTotal'
);
A1(
'diff',s) = A1('sum',s) - A1('colTotal'
,s);
display "-------------------------------------------------------------------"
,
       
"Results"
,
       
"-------------------------------------------------------------------"
,
        A1;


Results


----     73 -------------------------------------------------------------------
            Results
            -------------------------------------------------------------------

----     73 PARAMETER A1 

                  s1          s2          s3          s4          s5          s6          s7

PA           229.652     374.869     375.099     100.028                 686.238     212.542
PB           330.587     406.192     420.492     175.626      94.300     506.575     510.790
PC           267.313     224.685     241.809                  31.297     790.595     297.246
PD           595.262     380.610     639.416     275.614      31.391     687.580     599.252
PE           339.659     360.058     440.341     200.158      31.346     756.750     469.809
PF           130.330     187.815     197.821                             427.953     127.080
PG           309.478     330.895     351.165     125.418                 614.984     470.016
PH           360.594     395.631     326.596     148.455      51.665     569.947     586.867
PI           209.123     249.246     307.260     124.701                 719.378     252.398
colTotal    2772.000    2910.000    3300.000    1150.000     240.000    5760.000    3526.000
sum         2772.000    2910.000    3300.000    1150.000     240.000    5760.000    3526.000
diff     -9.0949E-13             -4.5475E-13 -2.2737E-13 -2.8422E-14 -9.0949E-13 -9.0949E-13

       +          s8          s9         s10    rowTotal         sum        diff

PA                        50.572                2029.000    2029.000 -2.2737E-13
PB                       243.545     109.894    2798.000    2798.000
PC            44.017     101.037                1998.000    1998.000
PD            88.299     101.341     167.233    3566.000    3566.000 -1.3642E-12
PE            44.086     151.793                2794.000    2794.000
PF                                              1071.000    1071.000
PG                        50.727      52.318    2305.000    2305.000 4.54747E-13
PH            43.598     150.111     113.535    2747.000    2747.000
PI                       100.874      52.019    2015.000    2015.000 -4.5475E-13
colTotal     220.000     950.000     495.000
sum          220.000     950.000     495.000
diff     5.68434E-14 1.13687E-13 -5.6843E-14