Yet Another Math Programming Consultant: July 2008

Thursday, July 31, 2008

GAMS optimizing assignments

The following fragment shows an unexpected performance problem. I thought all these assignments should be fast (i.e. only over the nonzero elements).

   1  set i /i1*i300/;
   2  alias (i,j,k);
   3   
   4  parameter p(i,j,k);
   5  p(i,i,i) = 3.14;
   6   
   7  parameter p1(i,j,k),p2(i,j,k);
   8   
   9  p1(i,j,k) = -p(i,j,k);
  10  p2(i,j,k)$p(i,j,k) = -p(i,j,k);
  
----      1 ExecInit                 0.000     0.000 SECS      3 Mb 
----      5 Assignment p             0.000     0.000 SECS      4 Mb    300
----      9 Assignment p1            2.683     2.683 SECS      4 Mb    300
----     10 Assignment p2            0.000     2.683 SECS      4 Mb    300

We see here the assignment to p1 is slow (GAMS runs through the carthesian product i x j x k) while the assignment to p2 is fast. I would expect the assignment to p1 to be fast as well.

An assignment of the form
p1(i,j,k) = (-1)*p(i,j,k)
would be optimized by GAMS while
p1(i,j,k) = -1*p(i,j,k)
would be slow again.

The profiler can help you detect these situations and optimize the assignments manually like I did for p2.

Hessian via PathNLP

Same model as Hessian via Convert but now using PathNLP.

$ontext
 
 
    Variance estimator for an MLE
 
    Erwin Kalvelagen July 2008
 
    Reference:
       William H. Greene, "Econometric Analysis", 5th edition, 2003
 
$offtext
 
set i 'cases' /i1*i20/;
 
table data(i,*)
 
                Y         E
     i1         20.5      12
     i2         31.5      16
     i3         47.7      18
     i4         26.2      16
     i5         44.0      12
     i6         8.28      12
     i7         30.8      16
     i8         17.2      12
     i9         19.9      10
    i10         9.96      12
    i11         55.8      16
    i12         25.2      20
    i13         29.0      12
    i14         85.5      16
    i15         15.1      10
    i16         28.5      18
    i17         21.4      16
    i18         17.7      20
    i19         6.42      12
    i20         84.9      16
;
 
parameter y(i),x(i);
y(i) = data(i,'y');
x(i) = data(i,'E');
 
scalar Var2Greene 'Variance estimate number two from Greene' / 46.16337 /;
 
variable b,L;
equation loglik;
 
loglik.. L =e= sum(i,log(b+x(i))) + sum(i,y(i)/(b+x(i)));
 
 
*
* estimation via pathnlp
*
 
$onecho > pathnlp.opt
hessian_gdx 1
$offecho
 
option nlp=pathnlp;
model estimation /loglik/;
estimation.optfile=1;
solve estimation minimizing L using nlp;
display "estimate",b.l;
 
 
*
* get hessian
*
scalar d2l__b_b 'hessian';
execute_load "hesData.gdx",d2l__b_b;
display d2l__b_b;
 
*
* inverting a 1x1 matrix is not very difficult
*
scalar v 'variance';
v = 1/d2l__b_b;
display v,var2greene;
 
abort$(abs(v-var2greene)>1.0e-5) "Accuracy check failed";

PathNLP prints all kind of debug output. This seems not completely polished yet.

Hessian via Convert

GAMS has new facilities to retrieve the Hessian matrix (matrix of second derivatives). Here we use Convert to produce a gdx file with the Hessian which we use to estimate the Variance of a Maximum Likelihood Estimation result.

$ontext
 
 
    Variance estimator for an MLE
 
    Erwin Kalvelagen July 2008
 
    Reference:
       William H. Greene, "Econometric Analysis", 5th edition, 2003
  
$offtext
  
set i 'cases' /i1*i20/;
  
table data(i,*)
 
                Y         E
     i1         20.5      12
     i2         31.5      16
     i3         47.7      18
     i4         26.2      16
     i5         44.0      12
     i6         8.28      12
     i7         30.8      16
     i8         17.2      12
     i9         19.9      10
    i10         9.96      12
    i11         55.8      16
    i12         25.2      20
    i13         29.0      12
    i14         85.5      16
    i15         15.1      10
    i16         28.5      18
    i17         21.4      16
    i18         17.7      20
    i19         6.42      12
    i20         84.9      16
;
 
parameter y(i),x(i);
y(i) = data(i,'y');
x(i) = data(i,'E');
 
scalar Var2Greene 'Variance estimate number two from Greene' / 46.16337 /;
 
variable b,L;
equation loglik;
 
loglik.. L =e= sum(i,log(b+x(i))) + sum(i,y(i)/(b+x(i)));
 
 
*
* estimation
*
model estimation /loglik/;
solve estimation minimizing L using nlp;
display "estimate",b.l;
 
 
*
* get hessian
*
option nlp=convert;
$onecho > convert.opt;
hessian
$offecho
estimation.optfile=1;
solve estimation minimizing L using nlp;
 
*
* gams cannot add elements at runtime so we declare the necessary elements here
*
set dummy /e1,x1/;
 
parameter h(*,*,*) '-hessian';
execute_load "hessian.gdx",h;
display h;
 
*
* inverting a 1x1 matrix is not very difficult
*
scalar v 'variance';
v = -1/h('e1','x1','x1');
display v,var2greene;
 
abort$(abs(v-var2greene)>1.0e-5) "Accuracy check failed";

Note: if you get a message

*** Msg=Error loading library c:\progra~1\gams\g2d: The specified module could not be found.

then copy gd.dll from the GAMS system directory to c:\progra~1\gams. This bug is already fixed.

Monday, July 28, 2008

GDX question

> I would like to make a gdx dump of all the sets/paramters/scalars of
> my model, but I do not want any equations or variables in there at
> all.

$version 227
 
  Sets
       i   canning plants   / seattle, san-diego /
       j   markets          / new-york, chicago, topeka / ;
 
  Parameters
 
       a(i)  capacity of plant i in cases
         /    seattle     350
              san-diego   600  /
 
       b(j)  demand at market j in cases
         /    new-york    325
              chicago     300
              topeka      275  / ;
 
  Table d(i,j)  distance in thousands of miles
                    new-york       chicago      topeka
      seattle          2.5           1.7          1.8
      san-diego        2.5           1.8          1.4  ;
 
  Scalar f  freight in dollars per case per thousand miles  /90/ ;
 
  Parameter c(i,j)  transport cost in thousands of dollars per case ;
 
            c(i,j) = f * d(i,j) / 1000 ;
 
  Variables
       x(i,j)  shipment quantities in cases
       z       total transportation costs in thousands of dollars ;
 
  Positive Variable x ;
 
  Equations
       cost        define objective function
       supply(i)   observe supply limit at plant i
       demand(j)   satisfy demand at market j ;
 
  cost ..        z  =e=  sum((i,j), c(i,j)*x(i,j)) ;
 
  supply(i) ..   sum(j, x(i,j))  =l=  a(i) ;
 
  demand(j) ..   sum(i, x(i,j))  =g=  b(j) ;
 
  Model transport /all/ ;
 
  Solve transport using lp minimizing z ;
 
  Display x.l, x.m ;
 
 
* generate include file execute_unload statement
* that write all symbols except variables and
* equations
$onecho > f.awk
BEGIN {
print "execute_unload 'mygdx'" > "temp1.inc"
}
/^ *[0-9]+ / {
  if ($4=="Equ") next;
  if ($4=="Var") next;
  print "," $2 > "temp1.inc"
}
END {
print ";" > "temp1.inc"
}
$offecho
$gdxout temp1
$unload
$gdxout
$call 'gdxdump temp1 symbols > temp1.txt'
$call 'awk -f f.awk temp1.txt'
$include temp1.inc
 
* mygdx.gdx will not contain any variables or equations
execute 'gdxdump mygdx symbols';

Basically at compile time we dump a gdx file temp1.gdx that has all symbols. Then we use an AWK scripts (still at GAMS compile time) to generate an include file temp1.inc that contains an execute_unload statement with all non var/equ symbols to mygdx.gdx. Then we $include this file. Finally for checking we call gdxdump (at execution time) to show all symbols in mygdx.gdx.

Thursday, July 24, 2008

gdx2xls does not work with 22.7

> Erwin. When tried to create xls file from gdx, I got an error message (see log file).
>*** Error at line 7379: Execute has nonzero return code RC=2

The GDX API has been changed between 22.5 and 22.6. You hare using 22.7 and your version of gdx2xls for <= 22.5. A work around is:

scalar s /3.14/;
execute_unload "s.gdx",s;
execute '=gdxcopy -v6c s.gdx oldgdx';
execute '=gdx2xls oldgdx\s.gdx';
execute '=shellexecute oldgdx\s.xls';.

This should show:

--- Job Untitled_39.gms Start 07/24/08 18:17:48
GAMS Rev 227  Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen                               G070509/0001CE-WIN
          GAMS Development Corporation                               DC4572
--- Starting compilation
--- Untitled_39.gms(9) 2 Mb
--- Starting execution: elapsed 0:00:00.003
--- Untitled_39.gms(7) 3 Mb
Writing files to: .\.\oldgdx
Converting file: .\s.gdx
--- Untitled_39.gms(8) 3 Mb
GDX2XLS Version 3.0
Reading GDX file oldgdx\s.gdx
Reading 1 symbols, sorting: 0.00 seconds
Reading 0 UELs: 0.00 seconds
Opening C:\projects\test\oldgdx\s.XLS with Excel: 0.31 seconds
           s. To scalar sheet
Total elapsed time: 0.72 seconds
--- Untitled_39.gms(9) 3 Mb
Shell Execute    May  1, 2008 22.7.2 WIN 3727.4799 VIS x86/MS Windows         
--- Untitled_39.gms(9) 3 Mb
*** Status: Normal completion
--- Job Untitled_39.gms Stop 07/24/08 18:19:49 elapsed 0:02:00.857

Update: unfortunately this worked for me, but not for the user. I don't understand why. Contact support@gams.com and ask for a 22.7 version of gdx2xls.exe.

gdxcopy puzzle

This does not look right:

c:\projects\test>gdxcopy -v6c save\s.gdx oldgdx
Writing files to: save\save\oldgdx
Converting file: save\s.gdx

Instead of writing to .\oldgdx or save\oldgdx it goes to save\save\oldgdx.

Update: this is fixed in 22.8 beta. Minor issue: the log messages have disappeared in the 22.8 version.

c:\projects\test>gdxcopy -v6c save\s.gdx oldgdx
 
c:\projects\test>

I would prefer to see what gdxcopy is doing like in the previous version.

Modeling question

>Among the constraints in the formulation my team have worked on,
>some of them have special properties like:
>x1>=a1
>x1+x2>=a1+a2
>x1+x2+x3>=a1+a2+a3
>...
>...
>x1+x2+x3+...+x39>=a1+a2+a3+...+a39
>
>Here a[i] are nonnegative parameters, so RHSs are non-decreasing.
>Besides these constraints, we deal with tons of constraints, so we are not looking
>for any algorithm or method, but looking for smart conversion of these constraints.
>My question is "Are they any ways to formulate this more intellingently or effectively
>to reduce computational effort?

parameter b(i);
alias (i,j);
b(i) = sum(j$(ord(j)<=ord(i)), a(j));

variable y(i);

ydef(i).. y(i) =e= y(i-1) + x(i);
y.lo(i) = b(i);

This requires more variables but many fewer nonzeroes, so likely more efficient (sparser).

Excel GUI for GIS and GAMS

This is work in progress: Excel as front-end for a GAMS Model, including a GIS component. Click to enlarge.

Tuesday, July 22, 2008

Modeling question

>I need to change the following non linear constraint to a linear one .
>the constraints are:
>
>for all k = 0 ,1, 2,3 ,4,...... 10
>
>k+1 = sum over j ( a[k][j] * d[j] )
>
>j = 1,2,3,4
>
>for all k, j a[k][j] is an integer <= 2 and d[j] is an integer <= 60. >
>both a[k][j] and d[j] are variables , making it a non linear
>constraint. Is there a method i can change it to a linear expression.
>Cplex gives a error with this constraint.

These question can only be answered more definitely by looking at the complete model, but let me give it a shot anyway.

The a(k,j) variables can only assume the values 0,1,2. So we can write this as a sum of 2 binary variables (and relax a to be continuous):

0 <= a(k,j) <= 2 continuous
b1(k,j), b2(k,j) binary
a(k,j) = b1(k,j)+2*b2(k,j);

Now we can express z(k,j) = a(k,j)*d(j) as

M1=20
z1(k,j) ≤ M1 · b1(k,j)
z1(k,j) ≤ d(j)
z1(k,j) ≥ d(j) − M1 · (1 − b1(k,j))
M2=40
z2(k,j) ≤ M2 · b2(k,j)
z2(k,j) ≤ 2*d(j)
z2(k,j) ≥ 2*d(j) − M2 · (1 − b2(k,j))
k + 1 = sum(j, z1(k,j)+z2(k,j));

See also: http://yetanothermathprogrammingconsultant.blogspot.com/2008/05/multiplication-of-continuous-and-binary.html.

Invert.exe raw mode gdx

Using raw mode to write the inverse (instead of string mode) yields:

02.416 seconds (n=1000)

The GAMS version should get similar performance (may be even faster as I did not use any compiler flags to do better optimization).

Saturday, July 19, 2008

More on Invert performance

The INVERT.EXE as compiled by GAMS and included in the 22.8 beta is somewhat slower than my own version. Here are some statistics:

                                n=1000
  2.28 32bit                 0:00:06.973
  2.28 64bit                 0:00:06.705
  my version (32 bit)        0:00:03.878    (gams part is 2.27 64 bit)

The margin is larger than I expected (I expected similar performance). Note: the above timings include GAMS times (total elapsed time of the whole run), so they understate the difference in INVERT.EXE performance. When timing invert.exe itself we see a speed difference of a factor two. I would not blink an eye when I see a 10% difference, but 100% difference indicates there must be something going on here. These timings are based on 22.8 beta, so we still have some time (until the final is out) to fix this.

Invert.exe performance

This model shows that my external solver INVERT.EXE is much faster than Cplex in inverting dense matrices. Obviously this is a case where INVERT is at its best, and Cplex is at a disadvantage, so this is not a poor performance of Cplex. These models are just not suited for a sparse LP solver.

Here is the model:

$ontext
 
   Invert the Pei test matrix
 
   Erwin Kalvelagen, july 2008
 
   Command line arguments:
      --n=integer   order of the matrix
      --alpha=real  scalar defining the Pei matrix
      --method=1    invert using equations A AINV = I
      --method=2    invert using equations A AINV = I using advanced basis
      --method=3    invert using external solver
 
 
   Reference:
 
    Erwin Kalvelagen
    SOLVING SYSTEMS OF LINEAR EQUATIONS WITH GAMS
    http://www.amsterdamoptimization.com/pdf/lineq.pdf
  
    ML Pei,
    A test matrix for inversion procedures,
    Communications of the ACM,
    Volume 5, 1962, page 508.
 
$offtext
 
*
* defaults
*
$if not set n $set n 100
$if not set method $set method 1
$if not set alpha $set alpha 1
 
scalar alpha /%alpha%/;
 
set i /i1*i%n%/;
alias (i,j,k);
 
parameter a(i,j);
a(i,j)=1;
a(i,i)=1+alpha;
 
parameter ident(i,j);
ident(i,i)=1;
 
variables
   ainv(i,j)
   dummy
;
 
equations
   e(i,j)
   edummy
;
 
edummy.. dummy =e= 0;
e(i,j)..  sum(k, a(i,k)*ainv(k,j)) =e= ident(i,j);
 
model m /all/;
 
option iterlim=1000000;
 
 
 
$if "%method%"==1 solve m minimizing dummy using lp;
 
$ifthen "%method%"==2
 
edummy.m=1;
e.m(i,j)=1;
dummy.m=0;
ainv.m(i,j)=0;
solve m minimizing dummy using lp;
 
$endif
 
$ifthen "%method%"==3
 
execute_unload 'a.gdx',i,a;
execute '=invert.exe a.gdx i a b.gdx pinva';
parameter pinva(i,j);
execute_load 'b.gdx',pinva;
 
$endif

We have implemented three methods of calculating the inverse.

Method=1 uses the LP solver to solve A * invA = I.
Method=2 is as method=1 but we use an advanced basis to speed up the solver.
Method=3 is calling the external solver INVERT.EXE

For more information see: http://www.amsterdamoptimization.com/pdf/lineq.pdf.

Some timings are listed below (all times in seconds):

             method=1   method=2   method=3
   n=50       0.637       0.433      0.027
   n=100      8.267       4.036      0.055
   n=200    313.236      53.395      0.118

The timings are taken from the last line in the log file. E.g.:

--- Job Untitled_36.gms Start 07/19/08 15:46:36
GAMS Rev 227  Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen                               G070509/0001CE-WIN
          GAMS Development Corporation                               DC4572
--- Starting compilation
--- Untitled_36.gms(81) 3 Mb
--- Starting execution: elapsed 0:00:00.004
--- Untitled_36.gms(69) 5 Mb
 ----------------------------------------
 INVERT: matrix inversion
 Erwin Kalvelagen, Amsterdam Optimization
 ----------------------------------------
 DLL:GDX Library      May  1, 2008 22.7.2 WIN 4701.4799 VIS x86/MS Windows
 GDXin:a.gdx
 Input set:i
 Input parameter:a
 LAPACK DGESV, n=         200
 GDXout:b.gdx
 Output parameter:pinva
 Done
--- Untitled_36.gms(71) 5 Mb
--- GDXin=C:\projects\test\b.gdx
--- Untitled_36.gms(71) 6 Mb
*** Status: Normal completion
--- Job Untitled_36.gms Stop 07/19/08 15:46:36 elapsed 0:00:00.118

Friday, July 18, 2008

LS Solver

The log of the LS solver as distributed in 22.8 beta is slightly different from the original version I compiled myself.

Gams version based on my donated sources:


 =============================================================
   Least Squares Solver                                       
 =============================================================
 
         Parameter      Estimate    Std. Error       t value      Pr(>|t|)
         b( const)  -3.48226E+06   8.90420E+05  -3.91080E+00   3.56040E-03 **
       b( gnpdefl)   1.50619E+01   8.49149E+01   1.77376E-01   8.63140E-01
           b( gnp)  -3.58192E-02   3.34910E-02  -1.06952E+00   3.12681E-01
        b( unempl)  -2.02023E+00   4.88400E-01  -4.13643E+00   2.53509E-03 **
          b( army)  -1.03323E+00   2.14274E-01  -4.82199E+00   9.44367E-04 ***
           b( pop)  -5.11041E-02   2.26073E-01  -2.26051E-01   8.26212E-01
          b( year)   1.82915E+03   4.55478E+02   4.01589E+00   3.03680E-03 **
      ---
      Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
   
 Estimation statistics:
 Cases: 16  Parameters: 7  Residual sum of squares:   8.36424E+05
 Residual standard error:   3.04854E+02   on  9 degrees of freedom
 Multiple R-squared:   9.95479E-01  Adjusted R-squared:   9.92465E-01
 F statistic:   3.30285E+02 on 6 and 9 DF, p-value:   4.98403E-10
   
 GDX library version: GDX Library      BETA 14Jul08 22.8.0 WIN 5737.5744 VIS x86
 /MS Windows
 GDX file: ls.gdx

My original version:


 =======================================================================
   Least Square Solver V2                                               
   Erwin Kalvelagen, Amsterdam Optimization Modeling Group              
   www.amsterdamoptimization.com                                        
 =======================================================================
 
         Parameter      Estimate    Std. Error       t value      Pr(>|t|)
        b('const')  -3.48226E+06   8.90420E+05  -3.91080E+00   3.56040E-03 **
      b('gnpdefl')   1.50619E+01   8.49149E+01   1.77376E-01   8.63141E-01
          b('gnp')  -3.58192E-02   3.34910E-02  -1.06952E+00   3.12681E-01
       b('unempl')  -2.02023E+00   4.88400E-01  -4.13643E+00   2.53509E-03 **
         b('army')  -1.03323E+00   2.14274E-01  -4.82199E+00   9.44367E-04 ***
          b('pop')  -5.11041E-02   2.26073E-01  -2.26051E-01   8.26212E-01
         b('year')   1.82915E+03   4.55478E+02   4.01589E+00   3.03680E-03 **
      ---
      Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
  
 Estimation statistics:
 Cases: 16  Parameters: 7  Residual sum of squares:   8.36424E+05
 Residual standard error:   3.04854E+02   on  9 degrees of freedom
 Multiple R-squared:   9.95479E-01  Adjusted R-squared:   9.92465E-01
 F statistic:   3.30285E+02 on 6 and 9 DF, p-value:   4.98403E-10
  
 DLL version: _GAMS_GDX_237_2007-01-09
 GDX file: ls.gdx

The labels seem to have lost their quotation marks. I hoped my code was easily portable, but this may indicate a weakness (although I don't see immediately what I did wrong here). Most importantly the numerical results are the same, so that is good. Actually my banner has "least square" which should read "least squares" (surely in most cases we have multiple squared errors).

Lags and day/hour time representation

> I have variables with two indices, e.g. x(d,h), where d and h are sets
> of days and hours.
>
> My problem is to make a predecessor/sucessor relation between those x
> variables. E.g. I have to model the difference between two neighboring
> x variables like:
>
> delta(d,h) =e= x(d,h) - x(pred(d,h)) ;
>
> delta(d,g) =l= constant ;
>
> Are there any ideas how to make this in an easy and elegant way?

In general it is much easier to work with one time index. I.e. use a single index t instead of two indices (d,h). This will make your model much simpler and easier to maintain.

If you want to print in the final reports in terms of (d,h) then use:

sets
  d / d1*d10 /
  h / h1*h24 /
  t / t1 * t240 /
;
 
 
variable delta(t),x(t);
equation e(t);
e(t).. delta(t)$(ord(t)>1) =e= x(t) - x(t-1) ;
 
 
set mapper(d,h,t);
mapper(d,h,t)$([ord(d)-1]*card(h)+ord(h) = ord(t)) = yes;
display mapper;
 
x.l(t) = uniform(0,1);
parameter xreport(d,h);
xreport(d,h) = sum(mapper(d,h,t),x.l(t));
display xreport;

Wednesday, July 16, 2008

GAMS log cosmetics

C:\projects\test>touch r.gms
 
C:\projects\test>type r.gms
 
C:\projects\test>gamslib trnsport
Model trnsport.gms retrieved
 
C:\projects\test>gams trnsport a=c s=t1
--- Job trnsport Start 07/16/08 20:27:27
GAMS Rev 227  Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen                               G070509/0001CE-WIN
          GAMS Development Corporation                               DC4572
--- Starting compilation
--- trnsport.gms(69) 3 Mb
*** Status: Normal completion
--- Job trnsport.gms Stop 07/16/08 20:27:27 elapsed 0:00:00.006
 
C:\projects\test>gams r r=t1
--- Job r Start 07/16/08 20:27:43
GAMS Rev 227  Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen                               G070509/0001CE-WIN
          GAMS Development Corporation                               DC4572
--- Starting continued compilation
--- Starting execution: elapsed 0:00:00.004
--- r.gms(45) 3 Mb
--- Generating LP model transport
--- r.gms(66) 3 Mb
---   6 rows  7 columns  19 non-zeroes
--- r.gms(66) 3 Mb
--- Executing CPLEX: elapsed 0:00:00.010
 
ILOG CPLEX       May  1, 2008 22.7.2 WEX 4792.4799 WEI x86_64/MS Windows
Cplex 11.0.1, GAMS Link 34
 
Reading data...
Starting Cplex...
Tried aggregator 1 time.
LP Presolve eliminated 1 rows and 1 columns.
Reduced LP has 5 rows, 6 columns, and 12 nonzeros.
Presolve time =    0.00 sec.
 
Iteration      Dual Objective            In Variable           Out Variable
     1              73.125000    x(seattle.new-york) demand(new-york) slack
     2             119.025000     x(seattle.chicago)  demand(chicago) slack
     3             153.675000    x(san-diego.topeka)   demand(topeka) slack
     4             153.675000  x(san-diego.new-york)  supply(seattle) slack
 
Optimal solution found.
Objective :         153.675000
 
--- Restarting execution
--- r.gms(66) 2 Mb
--- Reading solution for model transport
--- r.gms(68) 3 Mb
*** Status: Normal completion
--- Job r.gms Stop 07/16/08 20:27:43 elapsed 0:00:00.079
 
C:\projects\test>

In this example we do the following:

create an empty gams file r.gms
compile trnsport.gms and save as t1
restart empty file from t1

In this case the execution time line numbering refers to the wrong file. I.e. r.gms(45) refers to line 45 of the file r.gms, but that file has zero lines. I sent this to GAMS and they tell me execution errors have the correct file/line printed.

GAMS/IDE wish: search with regexp in .log/.lst

I would like to be able to search .lst and .log files using regular expressions. Currently the file viewer (the software that shows .lst and .log files in read-only mode in an IDE window) only allow normal search.

This is a somewhat unusual case. There are many solves in this model (thousands) and locate the correct solve I want to search for the second loop of "region1". In normal search region10,region11,.. and region100,region101,... also match. A quick way to search would be to search for "region1$" where $ means end-of-line. This requires regular expression (regexp for nerds) search. This is not (yet) available in the file viewers only in the editor. So the work-around suggested is to open the file in the editor (FileOpen in editor).

Suggestion: either implement regexp search for the file viewers or disable this choice for the file viewers (currently it is always enabled).

This is somewhat exotic: for "normal" models the log file is small.

Saturday, July 12, 2008

GAMS 22.8 issues

The new 22.8 beta is out (see http://www.gams.com/). Here are some initial experiences:

I made LS (regression solver) and INVERT (matrix inversion program) available for distribution. It is included but has a few minor issues.
I was hoping that the Cplex interface would allow lazy constraints to be expressed in GAMS models.
Mosek does not crash anymore on some of MIQPs that it does not like. The message is not very user-friendly: Return code - 1500 [MSK_RES_ERR_INV_PROBLEM]. May be in a future release more readable texts will be added.
Mosek still has problems with the model in http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/erratic-miqp-behavior.html. It gives the obscure message: Return code - 1050 [MSK_RES_ERR_UNKNOWN]. This will probably be fixed before the final release. (Update: bug still present in final 22.8).
Some of the IDE issues have not been fixed: http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/gamside-bug_06.html, http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/gamside-bug.html Admittedly they are not very important. (Although the Ctrl-T issue hits me quite often).
This has been fixed: http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/gams-bug-interrupt-does-not-work.html. It is now possible to interrupt a running GAMS model. This will help me a lot.
Qmaker and Cplex QCP interface is still very slow. I passed on a test case. (Update: the cplex QCP interface will be fixed for 22.8).
MIP interrupt in Mosek is largely fixed but the results are not always convincing:
**** SOLVER STATUS 8 USER INTERRUPT
**** MODEL STATUS 1 OPTIMAL
That combination looks suspicious. (This happens only occasionally, probably when hitting Ctrl-C near the end. In most cases it seems to work just fine.).
Tools like gdxcopy, gdx2xls have lost their log. They used write progress messages to stdout with useful info what is happening. These messages are removed.

Friday, July 11, 2008

GAMS 64 bit installer issue

> Update: this has been fixed in 22.9.

The 64 bit install of GAMS installs by default to C:\Program Files (x86)\gams22.8. This directory is for 32 bit software. 64 bit apps should be installed in C:\Program Files\gams22.8.

I also get with 32 bit and 64 bit install a message like this:

Looks dangerous, but if I say "This program installed correctly" it seems to be ok.

Thursday, July 10, 2008

Performance improvement

Click to enlarge. See what performance improvements in hardware and software can do for a practical LP model. You can try it yourself by downloading and running http://www.gams.com/modlib/libhtml/indus89.htm.

Wednesday, July 9, 2008

Find the bug

Q: locate the error in this GAMS code:

  19   
  20   
  21    Sets
  22         i   canning, plants   / seattle, san-diego /
  23         j   markets          / new-york, chicago, topeka / ;
  24   
  25    Parameters
  26   
  27         a(i)  capacity of plant i in cases
  28           /    seattle     350
****                      $361
**** 361  Values for domain 1 are unknown - no checking possible
  29                san-diego   600  /
  30   
  31         b(j)  demand at market j in cases
  32           /    new-york    325
  33                chicago     300
  34                topeka      275  / ;

Ordering of set elements in GAMS

> I don't understand the ordering of set elements
> in my display of a set.

If you look at:

set
i /a,c/
j /b,c/
;
display j;

then this will display:

---- 5 SET j
c, b

To understand this you'll need a little bit of knowledge how GAMS stores and orders set elements. Set elements are unique and stored in single large pool. They are ordered as they come. To display the pool (a.k.a universe) you can use:

alias (pool,*);
display pool;

This will show:

---- 8 SET pool Aliased with *
a, c, b

The ordering of elements (e.g. in display statements) is determined how they are ordered in the pool. It is also important in determining if a set is ordered (in that case the set follows the same ordering as the universe).

This can lead to surprises:

set
t1 /2001*2005/
t2 /2000*2006/;
display t2;

shows:

---- 5 SET t2
2001, 2002, 2003, 2004, 2005, 2000, 2006

A quick fix is to start with a dummy set to get the ordering correct:

set
dummy /2000*2010/
t1 /2001*2005/
t2 /2000*2006/;
display t2;

Monday, July 7, 2008

Bug in GAMS/Cplex link

GAMS/Cplex destroys here a perfectly good integer solution in the final LP. In that final LP the integers are fixed and the resulting LP is solved to get some dual information. In this case a resource limit is hit during this final LP and all is thrown away. Of course this is not what we want to see. The code in the GAMS/Cplex link related to this final LP solve need some careful inspection as we have seen problems with this part before.

Repeating presolve.
Tried aggregator 2 times.
MIP Presolve eliminated 1679 rows and 1752 columns.
MIP Presolve modified 4 coefficients.
Aggregator did 23 substitutions.
Reduced MIP has 184571 rows, 180577 columns, and 758788 nonzeros.
Represolve time =    1.63 sec.
Clique table members: 37.
MIP emphasis: optimality.
MIP search method: dynamic search.
Parallel mode: opportunistic, using up to 4 threads.
Root relaxation solution time =  137.66 sec.
 
        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer     Best Node    ItCnt     Gap

      0     0    14462.2442   195    14527.9285    14462.2442   178843    0.45%
Heuristic still looking.
Heuristic still looking.
*     0+    0                        14525.7851    14462.2442   178843    0.44%
      0     2    14462.2442   195    14525.7851    14462.2442   178843    0.44%
Elapsed real time = 4637.89 sec. (tree size =  0.00 MB, solutions = 8)
    100    99    14479.0518   197    14525.7851    14463.9712   232491    0.43%
                                                     Cuts: 15               
    200   197    14490.7443   197    14525.7851    14463.9712   277649    0.43%
*   295+  286                        14525.5834    14463.9712   344217    0.42%
    300   295    14505.3123   147    14525.5834    14463.9712   336888    0.42%
    400   395    14514.1868   145    14525.5834    14463.9712   368746    0.42%
*   433+  408                        14525.5412    14463.9712   378611    0.42%
    500   476    14494.6548   134    14525.5412    14463.9712   420167    0.42%
                                                  Impl Bds: 1               
*   544+  497                        14515.8473    14463.9712   437352    0.36%
*   545+  498                        14515.3114    14465.4987   441214    0.34%
Fixing integer variables, and solving final LP...
Tried aggregator 1 time.
LP Presolve eliminated 188052 rows and 205219 columns.
Aggregator did 6758 substitutions.
Reduced LP has 157128 rows, 152946 columns, and 633350 nonzeros.
Presolve time =    1.13 sec.
Initializing dual steep norms . . .

Iteration log . . .
Iteration:     1   Dual objective     =          6057.310401
Perturbation started.
Iteration:   102   Dual objective     =          6057.310401
Iteration:   878   Dual objective     =          7666.791470
Iteration:  1500   Dual objective     =          9296.038149
Iteration:  2211   Dual objective     =          9824.349977
Iteration:  2889   Dual objective     =         10466.849833
Iteration:  3542   Dual objective     =         10641.384471
Iteration:  4170   Dual objective     =         10892.599730
Iteration:  4795   Dual objective     =         10929.911432
Iteration:  5399   Dual objective     =         11011.407094
Iteration:  6008   Dual objective     =         11026.152235
Iteration:  6591   Dual objective     =         11028.613181
Iteration:  7181   Dual objective     =         11037.046345
Iteration:  7736   Dual objective     =         11039.881723
Iteration:  8318   Dual objective     =         11042.152553
Iteration:  8957   Dual objective     =         11044.445128
Iteration:  9519   Dual objective     =         11046.434903
Iteration: 10119   Dual objective     =         11048.475834
Iteration: 10685   Dual objective     =         11054.973343
Iteration: 11233   Dual objective     =         11057.071503
Iteration: 11774   Dual objective     =         11059.407174
Iteration: 12324   Dual objective     =         11062.135289
Iteration: 12869   Dual objective     =         11064.387325
Iteration: 13434   Dual objective     =         11066.637098
Iteration: 14031   Dual objective     =         11074.488101
Iteration: 14579   Dual objective     =         11076.981695
Iteration: 15089   Dual objective     =         11079.444173
Iteration: 15621   Dual objective     =         11082.365429
Iteration: 16145   Dual objective     =         11084.623184
Iteration: 16667   Dual objective     =         11087.253554
Iteration: 17199   Dual objective     =         11091.205312
Iteration: 17715   Dual objective     =         11098.091964
Iteration: 18262   Dual objective     =         11102.884193
Iteration: 18788   Dual objective     =         11106.328123
Iteration: 19295   Dual objective     =         11109.503749
Iteration: 19815   Dual objective     =         11114.531394
Iteration: 20324   Dual objective     =         11117.863392
Iteration: 20829   Dual objective     =         11120.704519
Iteration: 21333   Dual objective     =         11123.060983
Iteration: 21830   Dual objective     =         11128.551740
Iteration: 22365   Dual objective     =         11134.127949
Iteration: 22899   Dual objective     =         11137.954843
Iteration: 23365   Dual objective     =         11141.672390
Iteration: 23912   Dual objective     =         11149.342966
Iteration: 24419   Dual objective     =         11158.316182
Iteration: 24957   Dual objective     =         11179.374543
Iteration: 25473   Dual objective     =         11188.140937
Iteration: 26019   Dual objective     =         11193.656639
Iteration: 26520   Dual objective     =         11198.854075
Iteration: 27113   Dual objective     =         11209.155858
Iteration: 27664   Dual objective     =         11220.384286
Iteration: 28172   Dual objective     =         11228.944391
Iteration: 28665   Dual objective     =         11240.074822
Iteration: 29163   Dual objective     =         11246.180510
Iteration: 29682   Dual objective     =         11257.264795
Iteration: 30218   Dual objective     =         11275.780600
Iteration: 30745   Dual objective     =         11286.175702
Iteration: 31223   Dual objective     =         11293.826527
Iteration: 31706   Dual objective     =         11322.912597
Iteration: 32242   Dual objective     =         11345.656014
Iteration: 32792   Dual objective     =         11355.546099
Elapsed time =   10.16 sec. (33000 iterations).
Iteration: 33340   Dual objective     =         11365.028247
Iteration: 33859   Dual objective     =         11378.128137
Iteration: 34425   Dual objective     =         11384.719065
Iteration: 35106   Dual objective     =         11400.809173
Iteration: 35714   Dual objective     =         11410.512205
Iteration: 36277   Dual objective     =         11418.067340
Iteration: 36899   Dual objective     =         11440.861112
Iteration: 37511   Dual objective     =         11449.609068
Iteration: 38161   Dual objective     =         11461.996251
Iteration: 38848   Dual objective     =         11466.467047
Iteration: 39434   Dual objective     =         11479.910077
Iteration: 40002   Dual objective     =         11493.030687
Iteration: 40642   Dual objective     =         11500.724057
Iteration: 41217   Dual objective     =         11521.687737
Iteration: 41825   Dual objective     =         11532.637059
Iteration: 42526   Dual objective     =         11536.771010
Iteration: 43171   Dual objective     =         11541.448080
Iteration: 43797   Dual objective     =         11552.971322
Iteration: 44466   Dual objective     =         11565.311371
Iteration: 45131   Dual objective     =         11569.962544
Iteration: 45787   Dual objective     =         11574.606053
Iteration: 46461   Dual objective     =         11578.764831
Iteration: 47118   Dual objective     =         11588.017836
Iteration: 47736   Dual objective     =         11618.652276
Iteration: 48420   Dual objective     =         11627.910620
Iteration: 49100   Dual objective     =         11639.618577
Iteration: 49705   Dual objective     =         11709.154815
Iteration: 50257   Dual objective     =         11724.330084
Iteration: 50890   Dual objective     =         11731.428433
Iteration: 51600   Dual objective     =         11736.597227
Iteration: 52294   Dual objective     =         11744.978937
Iteration: 52895   Dual objective     =         11753.709220
Iteration: 53467   Dual objective     =         11765.418835
Iteration: 54163   Dual objective     =         11773.818329
Iteration: 54869   Dual objective     =         11780.943803
Iteration: 55550   Dual objective     =         11786.399499
Iteration: 56219   Dual objective     =         11792.327244
Iteration: 56894   Dual objective     =         11796.992322
Iteration: 57548   Dual objective     =         11800.527954
Iteration: 58177   Dual objective     =         11825.388844
Iteration: 58846   Dual objective     =         11838.119491
Iteration: 59544   Dual objective     =         11853.442610
Iteration: 60147   Dual objective     =         11864.848136
Iteration: 60781   Dual objective     =         11891.687076
Iteration: 61425   Dual objective     =         11906.740461
Iteration: 62087   Dual objective     =         11940.561093
Iteration: 62737   Dual objective     =         11975.219161
Elapsed time =   20.33 sec. (63000 iterations).
Iteration: 63454   Dual objective     =         12002.296415
Iteration: 64108   Dual objective     =         12015.440378
Iteration: 64818   Dual objective     =         12022.395479
Iteration: 65509   Dual objective     =         12029.058176
Iteration: 66219   Dual objective     =         12037.336718
Iteration: 66913   Dual objective     =         12048.634801
Iteration: 67550   Dual objective     =         12098.951656
Iteration: 68233   Dual objective     =         12148.667809
Iteration: 68861   Dual objective     =         12204.554100
Iteration: 69489   Dual objective     =         12260.068947
Iteration: 70151   Dual objective     =         12271.769079
Iteration: 71067   Dual objective     =         12340.904160
Iteration: 71916   Dual objective     =         12469.081691
Iteration: 72832   Dual objective     =         12538.397179
Iteration: 73689   Dual objective     =         12573.657042
Iteration: 74580   Dual objective     =         12633.804960
Iteration: 75466   Dual objective     =         12671.034083
Iteration: 76282   Dual objective     =         12709.759093
Iteration: 77157   Dual objective     =         12731.565162
Iteration: 78006   Dual objective     =         12791.001813
Iteration: 78861   Dual objective     =         12826.806885
Iteration: 79716   Dual objective     =         12862.410390
Iteration: 80579   Dual objective     =         12894.168594
Iteration: 81454   Dual objective     =         12987.178433
Iteration: 82296   Dual objective     =         13103.926493
Iteration: 83211   Dual objective     =         13321.544934
Iteration: 84091   Dual objective     =         13443.441166
Iteration: 84973   Dual objective     =         13562.343061
Iteration: 85890   Dual objective     =         13644.284360
Iteration: 86698   Dual objective     =         13720.220758
Iteration: 87401   Dual objective     =         13816.562168
Iteration: 88221   Dual objective     =         13881.829495
Iteration: 88849   Dual objective     =         13929.142875
Iteration: 89610   Dual objective     =         13984.379103
Iteration: 90422   Dual objective     =         14047.173914
Iteration: 91270   Dual objective     =         14115.140590
Iteration: 92125   Dual objective     =         14171.749778
Iteration: 93089   Dual objective     =         14217.148596
Iteration: 93982   Dual objective     =         14263.230690
Elapsed time =   30.39 sec. (94000 iterations).
Iteration: 94826   Dual objective     =         14320.598940
Iteration: 95556   Dual objective     =         14345.861439
Iteration: 96804   Dual objective     =         14379.439770
Iteration: 97540   Dual objective     =         14399.637101
Iteration: 98353   Dual objective     =         14415.843344
Iteration: 99135   Dual objective     =         14486.973327
Iteration: 100230   Dual objective     =         14507.696311
Iteration: 101753   Dual objective     =         14524.245965
Iteration: 102889   Dual objective     =         14536.589082
Iteration: 104245   Dual objective     =         14546.083262
Iteration: 105151   Dual objective     =         14553.547999
Iteration: 106121   Dual objective     =         14558.473735
Iteration: 107058   Dual objective     =         14562.528434
Iteration: 107846   Dual objective     =         14564.898252
Iteration: 108562   Dual objective     =         14568.707843
Iteration: 109171   Dual objective     =         14569.741540
Iteration: 110267   Dual objective     =         14570.230927
Iteration: 111442   Dual objective     =         14570.636373
Iteration: 113264   Dual objective     =         14570.833100
Iteration: 115164   Dual objective     =         14571.039823
Removing perturbation.
Iteration: 115366   Scaled dual infeas =             0.068957
Iteration: 118957   Scaled dual infeas =             0.000000
Iteration: 119095   Dual objective     =         14519.183719
Iteration: 119885   Dual objective     =         14526.787115
Elapsed time =   40.78 sec. (120000 iterations).
Iteration: 121024   Dual objective     =         14527.291310
Iteration: 121967   Dual objective     =         14527.804039
Iteration: 122614   Dual objective     =         14527.921795
Removing shift (76).
Iteration: 122721   Scaled dual infeas =             0.000158
Iteration: 122814   Dual objective     =         14527.911155
Removing shift (3).

Resource limit exceeded.

MIP Solution:        14527.928528    (0 iterations, 0 nodes)
Final Solve:         14527.928596    (122973 iterations)

Best possible:               -inf
Absolute gap:                  na
Relative gap:                  na

A workaround is to increase reslim (so we have enough time to finish the final solve) or to decrease reslim (so we stop on time and allocate extra time to do the final solve). Note: as the duals are being used, it was no option to skip the final solve using option solvefinal.

Update: the GAMS people say the reported solution (primal+dual) should be ok.
I only have the log file, so I don't have evidence otherwise.

Parallel processing of large models

> It might be desirable to invoke parallel processing of my optimization model
> on a network of PCs.

The simplest way to use parallel processing is to use a parallel solver such as Cplex, Xpress or Mosek and tell it to use multiple threads. E.g. with Cplex say threads 4 in an option file and it will exploit your quad core machine. Be aware that parallel processing is not always faster and that you need extra memory. This option only works on SMP machines (basically computers with multiple CPUs/cores).

If I remember correctly Xpress had once a version that worked on a network of machines using PVM. I don't think this was ever available under GAMS. GAMS/Xpress can use multiple threads, but again as SMP only.

GAMS has some grid facilities and the GAMS people have been pushing the Sun grid facility. In my experience this is all somewhat premature. The GAMS grid stuff is fragile and the Sun grid facility looks primitive. It would require quite some work to make this more user friendly. In addition, you need to make sure your model is suited for this type of architecture. The problem must be decomposable in a number of independent jobs, otherwise this approach won't work. Once you get this to work, in many cases you will find out that performance is often disappointing compared to a serial implementation on a fast PC.

Note that GAMS itself is single-threaded. I am not aware of a parallel modeling language although this would make sense for some classes of models. The block structure could be exploited for coarse-grained parallellism. The GAMS internals are very messy so I don't think this will be an option that will be added quickly.

How to call GAMS from Access?

Short Answer: invoke gams.exe using the CreateProcess API call. In a practical application much more is needed:

Create a button so user can launch GAMS
Find the location where GAMS.EXE is located
Create unique directory where scratch files can be read and written
Provide a mechanism to write problem data to a GAMS file
Call GAMS, capture log output and show in window, if needed add an interrupt button to stop the solution process
Provide a mechanism to read solution data from the GAMS run
Presentation of solution

Below is an example: it solves the Traveling Salesman Problem or Minimum Spanning Tree Problem against a 42 US city data set. The TSP is solved using a cutting place algorithm coded in GAMS. All the steps described above are coded in VBA. Click to enlarge.

You'll see four windows inside Access. The first window contains some buttons that allow the user to drive the application. The log window shows any progress by GAMS. It also has some buttons to allow the user to interrupt the solution process. The graph is a simple way to display the solution. The right-lower window shows a data table with city related information (name, coordinates etc.)

Saturday, July 5, 2008

Bootstrap code for forming confidence intervals in Max Entropy estimation

Maximum entropy estimation has become an important tool when few observations are available. Some references are:

Amos Golan, George G. Judge, and Douglas Miller, Maximum Entropy Econometrics: Robust Estimation with Limited Data Wiley, 1996
Channing Arndt, Sherman Robinson, and Finn Tarp, Parameter Estimation for a Computable General Equilibrium Model: A Maximum Entropy Approach http://www.ifpri.org/divs/tmd/dp/papers/tmdp40.pdf
Erwin Kalvelagen, Least Squares Calculations with GAMS http://amsterdamoptimization.com/pdf/ols.pdf section 6
Randall C. Campbell and R. Carter Hill, Maximum Entropy Estimation in Economic Models with Linear Inequality Restrictions, Tech. Report 2001-11, Louisiana State University, Department of Economics, 2001. http://ideas.repec.org/p/lsu/lsuwpp/2001-11.html

Confidence intervals can be formed using the bootstrap or resampling approach. The percentile method is used to estimate the confidence intervals. The model below illustrates how this can be implemented in GAMS.

$ontext

   Bootstrap GME model
 
   Copyright 2007 Erwin Kalvelagen, Amsterdam Optimization
 
   References:
     [1] Maximum entropy estimation in economic models with linear
         inequality restrictions, Randall C. Campbell, R. Carter Hill
         Department of Economics, Louisiana State University,
         Baton Rouge, LA 70803,USA, 2001
 
     [2] Ramanathan, R., 2002. Introductory econometrics with applications
         (Harcourt College Publishers, Fort Worth).
 
     [3] Bardley Efron, Robert J. Tibshirani, "An Introduction to the Bootstrap",
         Chapman & Hall, 1993
 
   Files:
     bootstrap-gme.gms    -- gams file
     poverty.inc          -- data file
     tc.inc (optional)    -- file with quantiles for Student t distribution
 
 
$offtext
 
 
set i /case1*case116/
    k0 /
        constant 'constant term in regression'
        povrate  'poverty level'
        urb      'not used'
        famsize  'average household size'
        unemp    'percentage unemployment rate'
        highschl 'percentage of population age 25 and over with high school degree only'
        college  'percentage of population age 25 and over that completed 4 or more years of college'
        medinc   'median household income'
        D90      'dummy: one for the 1990 Census and zero for the 1980 Census'
      /
   k1(k0) 'data used' /povrate,famsize,unemp,highschl,college,medinc,D90/
;
 
*------------------------------------------------------------------------------
* input data from Ramanthan (2002, Data 7-6, p. 653)
*------------------------------------------------------------------------------
 
table data(i,k0) '-- input data from Ramanthan [2] (2002, Data 7-6, p. 653)'
$include poverty.inc
;
display data;
 
 
*------------------------------------------------------------------------------
* critical values t distribution
* this table is available from amsterdamoptimization.com
* if not available you can use a normal approximation
*------------------------------------------------------------------------------
 
$if exist tc.inc $include tc.inc
$if defined tc $goto tcok
 
 
* we have no access to tc.inc so we use a normal approximation
 
set prob /p1,p2,p3,p4,p5,p6/;
parameter probval(prob) /
  p1 0.10,  p2 0.05,  p3 0.025,  p4 0.01,  p5 0.005,  p6 0.001
/;
 
parameter qnorm(prob) /
  p1 1.281552, p2 1.644854, p3 1.959964, p4 2.326348, p5 2.575829, p6 3.090232
/;
 
$label tcok
 
 
*------------------------------------------------------------------------------
* create some summary statistics
*------------------------------------------------------------------------------
 
parameter summary(k1,*) '-- summary statistics (table 1 in [1])';
summary(k1,'mean') = sum(i,data(i,k1))/card(i);
summary(k1,'min') = smin(i,data(i,k1));
summary(k1,'max') = smax(i,data(i,k1));
summary(k1,'stdev') = sqrt[sum(i,sqr[data(i,k1)-summary(k1,'mean')])/(card(i)-1)];
summary(k1,'coeffvar') = summary(k1,'stdev')/summary(k1,'mean');
display summary;
 
*------------------------------------------------------------------------------
* linear regression using OLS
*------------------------------------------------------------------------------
 
set k(k0) 'regression coefficients'
    /constant,famsize,unemp,highschl,college,medinc,D90/;
data(i,'constant') = 1;
 
variables
  beta(k)  'regression coefficients'
  sse      'sum of squared errors'
;
equations
  dummyobj  'dummy objective'
  fit(i)    'fit linear equations'
;
 
dummyobj..   sse =n= 0;
fit(i)..     data(i,'povrate') =e=  sum(k, beta(k)*data(i,k));
 
model ols /dummyobj,fit/;
 
option lp=ls;
ols.limrow = 0;
ols.limcol = 0;
ols.solprint = 0;
solve ols using lp minimizing sse;
 
* Note: compare OLS results to page 8 in [1]
 
parameter estim(k,*,*) 'estimates';
estim(k,'OLS','-') = beta.l(k);
 
*------------------------------------------------------------------------------
* Calculate OLS confidence intervals
*------------------------------------------------------------------------------
 
parameter ols_se(k) 'standard errors';
ols_se(k) = beta.m(k);
display ols_se;
 
set ival 'confidence interval' /lo,up/;
scalar ndf 'degrees of freedom';
ndf = card(i) - card(k);
scalar alpha 'significance level' /0.025/;
scalar qt 'critical value';
$if defined tc        qt = sum((df,prob)$(dfval(df)=ndf and probval(prob)=0.025), tc(df,prob));
$if not defined tc    abort$(ndf<30) "Normal approximation not valid";
$if not defined tc    qt = sum(prob$(probval(prob)=0.025), qnorm(prob));
display ndf,alpha,qt;
 
parameter ols_conf_ival(k,ival);
ols_conf_ival(k,'lo') = beta.l(k) - qt*ols_se(k);
ols_conf_ival(k,'up') = beta.l(k) + qt*ols_se(k);
display ols_conf_ival;
 
* Note: these can also be imported from the GDX file LS.GDX
* See http://amsterdamoptimization.com/pdf/nlregression.pdf
 
*------------------------------------------------------------------------------
* GME model
*------------------------------------------------------------------------------
 
set j 'support points' /j1*j5/;
 
variables p(k,j), w(i,j);
p.lo(k,j) = 0.0001;
w.lo(i,j) = 0.0001;
 
variable
  entrpy      'entropy'
  e(i)        'residuals'
;
equations
  parmsupp(k) 'parameter support'
  errsupp(i)  'error support'
  normp(k)    'normalize p'
  normw(i)    'normalize w'
  obj         'maximize entropy'
  linear(i)   'linear model'
;
 
parameter
  z(k,j)      'parameter support matrix'
  v(i,j)      'error support matrix'
;
 
obj..          entrpy =e= -sum((k,j), p(k,j)*log(p(k,j)))-sum((i,j), w(i,j)*log(w(i,j)));
linear(i)..    data(i,'povrate') =e= sum(k, beta(k)*data(i,k)) + e(i);
parmsupp(k)..  beta(k) =e= sum(j, z(k,j)*p(k,j));
errsupp(i)..   e(i) =e= sum(j, v(i,j)*w(i,j));
normp(k)..     sum(j, p(k,j)) =e= 1;
normw(i)..     sum(j, w(i,j)) =e= 1;
 
model gme /obj,parmsupp,errsupp,linear,normp,normw/;
 
 
*------------------------------------------------------------------------------
* GME model support data
*------------------------------------------------------------------------------
set s 'error support scenarios' / s3, s4 /;
set gmex 'parameter support scenarios' /gme1,gme2,gme3/;
 
table z_all(gmex,k,j) 'parameter support for GME model'
                      j1   j2   j3   j4   j5
   gme1.constant     -50  -25    0   25   50
   gme1.famsize      -20  -10    0   10   20
   gme1.unemp        -20  -10    0   10   20
   gme1.highschl     -20  -10    0   10   20
   gme1.college      -20  -10    0   10   20
   gme1.medinc       -20  -10    0   10   20
   gme1.d90          -20  -10    0   10   20
 
   gme2.constant     -50  -25    0   25   50
   gme2.famsize      -10   -5    0    5   10
   gme2.unemp         -2   -1    0    1    2
   gme2.highschl      -2   -1    0    1    2
   gme2.college       -2   -1    0    1    2
   gme2.medinc       -10   -5    0    5   10
   gme2.d90          -20  -10    0   10   20
 
   gme3.constant     -50  -25    0   25   50
   gme3.famsize       -5    0    5   10   15
   gme3.unemp         -1    0    1    2    3
   gme3.highschl      -3   -2   -1    0    1
   gme3.college       -3   -2   -1    0    1
   gme3.medinc       -15  -10   -5    0    5
   gme3.d90          -20  -10    0   10   20
 
;
 
table esup_all(s,j) 'error support'
        j1    j2  j3    j4  j5
   s3  -10  -5.0   0   5.0  10
   s4  -13  -6.5   0   6.5  13
;
 
display "GME support data",z_all,esup_all;
 
*------------------------------------------------------------------------------
* run GME scenarios
*------------------------------------------------------------------------------
 
gme.limrow=0;
gme.limcol=0;
gme.solprint=2;
gme.solvelink=2;
loop((s,gmex),
  z(k,j) = z_all(gmex,k,j);
  v(i,j) = esup_all(s,j);
  solve gme maximizing entrpy using nlp;
  estim(k,s,gmex)=beta.l(k);
);
 
*------------------------------------------------------------------------------
* report results
*------------------------------------------------------------------------------
 
option estim:3:1:2;
display "---------------------------------------------------",
        "OLS and GME estimates (table 5 in [1])",
        "---------------------------------------------------",
         estim;
 
*------------------------------------------------------------------------------
* create bootstrap sample
*------------------------------------------------------------------------------
 
 
set nbs 'bootstrap samples' /sample1*sample400/;
 
parameter random(nbs,i) 'random draws with replacement';
random(nbs,i) = uniformint(1,card(i));
display random;
 
alias (i,ii);
set srandom(nbs,i,ii) 'as random but now expressed as a set';
srandom(nbs,i,ii)$(ord(ii)=random(nbs,i)) = yes;
display srandom;
 
 
*------------------------------------------------------------------------------
* run bootstrap method
*------------------------------------------------------------------------------
 
 
parameter newdata(i,k0);
 
equation linear2(i) 'linear model';
 
linear2(i)..    newdata(i,'povrate') =e= sum(k, beta(k)*newdata(i,k)) + e(i);
 
model gme2 /obj,parmsupp,errsupp,linear2,normp,normw/;
 
gme2.limrow=0;
gme2.limcol=0;
gme2.solprint=2;
gme2.solvelink=2;
 
parameter estim2(nbs,k,s,gmex) 'estimates';
 
loop(nbs,
    newdata(i,k0) = 0;
    newdata(i,k0) =  sum(srandom(nbs,i,ii), data(ii,k0));
 
    loop((s,gmex),
       z(k,j) = z_all(gmex,k,j);
       v(i,j) = esup_all(s,j);
       solve gme2 maximizing entrpy using nlp;
       estim2(nbs,k,s,gmex)=beta.l(k);
    );
);
 
display estim2;
 
*------------------------------------------------------------------------------
* calculate bootstrap confidence intervals
* percentile method
*------------------------------------------------------------------------------
 
 
* call rank first not from within a loop so declarations can happen
set idummy/a/;
parameters
   xdummy(idummy) /a 1/
   rdummy(idummy)
;
$libinclude rank xdummy idummy rdummy
 
 
parameter bconfival2(k,*,*,ival) 'confidence interval percentile method -- table 13 in [1]';
bconfival2(k,'OLS','-',ival) = ols_conf_ival(k,ival);
 
parameter
   bb(nbs)    'values to be sorted'
   pct(ival)  'percentiles'
   r(nbs)     'rank'
;
loop((k,s,gmex),
   bb(nbs) = estim2(nbs,k,s,gmex);
   pct('LO') = 2.5;
   pct('UP') = 97.5;
$libinclude rank bb nbs r pct
   bconfival2(k,s,gmex,ival) = pct(ival);
);
 
option bconfival2:3:1:3;
display bconfival2;

References:

Thomas Rutherford, Paul van der Eijk, A GAMS Utility for Ranking One-Dimensional Numeric Data, http://www.mpsge.org/gdxrank/
table with critical values: http://amsterdamoptimization.com/models/bootstrap/tc.inc
example data set: http://amsterdamoptimization.com/models/bootstrap/poverty.inc
Model file: http://amsterdamoptimization.com/models/bootstrap/bootstrap-gme.gms

Friday, July 4, 2008

ps2pdf

This online converter is handy if one needs to convert a postscript file to PDF:
http://www.ps2pdf.com/convert.htm.

Question about presolve

>Both AMPL and AIMMS have a built-in presolver that reduces
>the size of a model before it is passed on to the solver.
>Does GAMS have a presolver?

No. GAMS does not have a built-in presolver. It only has an option to remove fixed variables (model.holdfixed=1;). Solvers like GAMS/Cplex, GAMS/CONOPT have a built-in presolver inside the solver, but solvers like MINOS, SNOPT, SBB do not have this and they will need to work on the full size of the model. There are a number of advantages of having a presolver built into the modeling system: all solvers benefit even those without their own presolver, the modeling system can often provide better error messages and other feedback, and a smaller model to pass on to the solver may be cheaper (less I/O and other overhead).

Critical values for the Student's t distribution

Confidence intervals for OLS regression models require to look up values for the Student's t distribution. The following GAMS include file will implement such a table in GAMS readable format.

table with critical values: http://amsterdamoptimization.com/php_uploads/Download/tc.inc
example data set: http://amsterdamoptimization.com/php_uploads/Download/poverty.inc

Example:

set i /case1*case116/
    k0 /
        constant 'constant term in regression'
        povrate  'poverty level'
        urb      'not used'
        famsize  'average household size'
        unemp    'percentage unemployment rate'
        highschl 'percentage of population age 25 and over with high school degree only'
        college  'percentage of population age 25 and over that completed 4 or more years of college'
        medinc   'median household income'
        D90      'dummy: one for the 1990 Census and zero for the 1980 Census'
      /
   k1(k0) 'data used' /povrate,famsize,unemp,highschl,college,medinc,D90/
;
 
*------------------------------------------------------------------------------
* input data from Ramanthan (2002, Data 7-6, p. 653)
*------------------------------------------------------------------------------
 
table data(i,k0) '-- input data from Ramanthan (2002, Data 7-6, p. 653)'
$include poverty.inc
;
display data;
 
 
*------------------------------------------------------------------------------
* critical values t distribution
*------------------------------------------------------------------------------
  
$include tc.inc
  
*------------------------------------------------------------------------------
* create some summary statistics
*------------------------------------------------------------------------------
  
parameter summary(k1,*) '-- summary statistics';
summary(k1,'mean') = sum(i,data(i,k1))/card(i);
summary(k1,'min') = smin(i,data(i,k1));
summary(k1,'max') = smax(i,data(i,k1));
summary(k1,'stdev') = sqrt[sum(i,sqr[data(i,k1)-summary(k1,'mean')])/(card(i)-1)];
summary(k1,'coeffvar') = summary(k1,'stdev')/summary(k1,'mean');
display summary;
 
*------------------------------------------------------------------------------
* linear regression using OLS
*------------------------------------------------------------------------------
   
set k(k0) 'regression coefficients'
    /constant,famsize,unemp,highschl,college,medinc,D90/;
data(i,'constant') = 1;
 
variables
  beta(k)  'regression coefficients'
  sse      'sum of squared errors'
;
equations
  dummyobj  'dummy objective'
  fit(i)    'fit linear equations'
;
 
dummyobj..   sse =n= 0;
fit(i)..     data(i,'povrate') =e=  sum(k, beta(k)*data(i,k));
 
model ols /dummyobj,fit/;
 
option lp=ls;
solve ols using lp minimizing sse;
 
parameter ols_se(k) 'standard errors';
ols_se(k) = beta.m(k);
 
set ival 'confidence interval' /lo,up/;
scalar ndf 'degrees of freedom';
ndf = card(i) - card(k);
scalar alpha 'significance level' /0.025/;
scalar qt 'critical value';
qt = sum((df,prob)$(dfval(df)=ndf and probval(prob)=0.025), tc(df,prob));
display ndf,alpha,qt;
 
parameter ols_conf_ival(k,ival);
ols_conf_ival(k,'lo') = beta.l(k) - qt*ols_se(k);
ols_conf_ival(k,'up') = beta.l(k) + qt*ols_se(k);
display ols_se,ols_conf_ival;

Example output:

 =======================================================================
   Least Square Solver                                                  
   Erwin Kalvelagen, November 2004, October 2006                        
 =======================================================================
 
         Parameter      Estimate    Std. Error       t value      Pr(>|t|)
  beta('constant')   0.21659E+02   0.55303E+01   0.39164E+01   0.15708E-03 ***
   beta('famsize')   0.18042E+01   0.11617E+01   0.15531E+01   0.12329E+00
     beta('unemp')   0.76467E-01   0.59044E-01   0.12951E+01   0.19803E+00
  beta('highschl')  -0.20135E+00   0.39135E-01  -0.51450E+01   0.11914E-05 ***
   beta('college')   0.21313E-01   0.45803E-01   0.46532E+00   0.64263E+00
    beta('medinc')  -0.41557E+00   0.46677E-01  -0.89030E+01   0.13484E-13 ***
       beta('D90')   0.85045E+01   0.10434E+01   0.81508E+01   0.66416E-12 ***
      ---
      Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
  
 Estimation statistics:
 Cases: 116  Parameters: 7  Residual sum of squares:   0.32146E+03
 Residual standard error:   0.17173E+01   on  109 degrees of freedom
 Multiple R-squared:   0.74583E+00  Adjusted R-squared:   0.73183E+00
 F statistic:   0.53307E+02 on 6 and 109 DF, p-value:   0.00000E+00
   
 DLL version: _GAMS_GDX_237_2007-01-09
 GDX file: ls.gdx
  
 
----   1276 PARAMETER ndf                  =      109.000  degrees of freedom
            PARAMETER alpha                =        0.025  significance level
            PARAMETER qt                   =        1.982  critical value
 
----   1281 PARAMETER ols_se  standard errors
 
constant 5.530,    famsize  1.162,    unemp    0.059,    highschl 0.039,    college  0.046,    medinc   0.047
D90      1.043
 
 
----   1281 PARAMETER ols_conf_ival  
 
                  lo          up
 
constant      10.698      32.619
famsize       -0.498       4.107
unemp         -0.041       0.193
highschl      -0.279      -0.124
college       -0.069       0.112
medinc        -0.508      -0.323
D90            6.436      10.572

Tuesday, July 1, 2008

Fortran question

I'm building and running LSSolver, but there are some problems interfacing the linking code in ls.f90 with the various Fortran sources (from TOMS and for the Student-t inverse, for example) on Linux.

Yes, for these older fortan codes you may need to add/real_size:64, -r8, /autodouble or something similar to the command line to set the default real size. I think all fortran compilers have a flag like that so you don't have to edit all declarations.

Here is a list for some compilers:

Visual Fortran: /real_size:64
Intel Fortran: -real-size 64 (linux) or /real-size:64 (windows)
Lahey/Fujitsu Fortran: -dbl
gfortran: -fdefault-real-8
g95: -r8
IBM XL Fortran: -qrealsize=8
SUN Fortran: -xtypemap=real:64

Converting MathProg (and AMPL) models to GAMS

The tool glpk2gams can convert models written in MathProg (subset of AMPL) so we can quickly assess how GAMS solvers perform on such a model.

If the model only has a .mod file, you can just use a single argument:

D:\glpk\glpk2gams\Debug>glpk2gams.exe diet.mod
Glpk Model filename: diet.mod
Gams filename: diet.gms
  
Reading model section from diet.mod...
Reading data section from diet.mod...
99 lines were read
Generating nb...
Generating cost...
Model has been successfully generated
  
Rows : 10  Columns : 20  Nonzero elements : 179
Writing diet.gms
  
D:\glpk\glpk2gams\Debug>

If the model also has a .dat file, you need to give all three command line arguments:

L:\glpk>glpk2gams egypt2.mod egypt2.gms egypt2.dat
Glpk Model filename: egypt2.mod
Gams filename: egypt2.gms
Data filename: egypt2.dat
 
Reading model section from egypt2.mod...
egypt2.mod:264: warning: unexpected end of file; missing end statement inserted
264 lines were read
Reading data section from egypt2.dat...
egypt2.dat:276: warning: unexpected end of file; missing end statement inserted
276 lines were read
Generating Psi...
Generating mbd...
Generating mbdb...
Generating mb...
Generating cc...
Generating ap...
Generating al...
Generating ai...
Model has been successfully generated
  
Rows : 285  Columns : 351  Nonzero elements : 1336
Writing egypt2.gms
 
L:\glpk>

It is generating scalar GAMS code, so this is not suited to convert models that have to be maintained in GAMS.