Thursday, July 31, 2008

GAMS optimizing assignments

The following fragment shows an unexpected performance problem. I thought all these assignments should be fast (i.e. only over the nonzero elements).

   1  set i /i1*i300/;
2 alias (i,j,k);
3
4 parameter p(i,j,k);
5 p(i,i,i) = 3.14;
6
7 parameter p1(i,j,k),p2(i,j,k);
8
9 p1(i,j,k) = -p(i,j,k);
10 p2(i,j,k)$p(i,j,k) = -p(i,j,k);

---- 1 ExecInit 0.000 0.000 SECS 3 Mb
---- 5 Assignment p 0.000 0.000 SECS 4 Mb 300
---- 9 Assignment p1 2.683 2.683 SECS 4 Mb 300
---- 10 Assignment p2 0.000 2.683 SECS 4 Mb 300


We see here the assignment to p1 is slow (GAMS runs through the carthesian product i x j x k) while the assignment to p2 is fast. I would expect the assignment to p1 to be fast as well.

An assignment of the form
p1(i,j,k) = (-1)*p(i,j,k)
would be optimized by GAMS while
p1(i,j,k) = -1*p(i,j,k)
would be slow again.

The profiler can help you detect these situations and optimize the assignments manually like I did for p2.

Hessian via PathNLP

Same model as Hessian via Convert but now using PathNLP.


$ontext


Variance estimator for an MLE

Erwin Kalvelagen July 2008

Reference:
William H. Greene, "Econometric Analysis", 5th edition, 2003

$offtext

set i 'cases' /i1*i20/;

table data(i,*)

Y E
i1 20.5 12
i2 31.5 16
i3 47.7 18
i4 26.2 16
i5 44.0 12
i6 8.28 12
i7 30.8 16
i8 17.2 12
i9 19.9 10
i10 9.96 12
i11 55.8 16
i12 25.2 20
i13 29.0 12
i14 85.5 16
i15 15.1 10
i16 28.5 18
i17 21.4 16
i18 17.7 20
i19 6.42 12
i20 84.9 16
;

parameter y(i),x(i);
y(i) = data(i,'y');
x(i) = data(i,'E');

scalar Var2Greene 'Variance estimate number two from Greene' / 46.16337 /;

variable b,L;
equation loglik;

loglik.. L =e= sum(i,log(b+x(i))) + sum(i,y(i)/(b+x(i)));


*
* estimation via pathnlp
*

$onecho > pathnlp.opt
hessian_gdx 1
$offecho

option nlp=pathnlp;
model estimation /loglik/;
estimation.optfile=1;
solve estimation minimizing L using nlp;
display "estimate",b.l;


*
* get hessian
*
scalar d2l__b_b 'hessian';
execute_load "hesData.gdx",d2l__b_b;
display d2l__b_b;

*
* inverting a 1x1 matrix is not very difficult
*
scalar v 'variance';
v = 1/d2l__b_b;
display v,var2greene;

abort$(abs(v-var2greene)>1.0e-5) "Accuracy check failed";


PathNLP prints all kind of debug output. This seems not completely polished yet.

Hessian via Convert

GAMS has new facilities to retrieve the Hessian matrix (matrix of second derivatives). Here we use Convert to produce a gdx file with the Hessian which we use to estimate the Variance of a Maximum Likelihood Estimation result.

$ontext


Variance estimator for an MLE

Erwin Kalvelagen July 2008

Reference:
William H. Greene, "Econometric Analysis", 5th edition, 2003

$offtext

set i 'cases' /i1*i20/;

table data(i,*)

Y E
i1 20.5 12
i2 31.5 16
i3 47.7 18
i4 26.2 16
i5 44.0 12
i6 8.28 12
i7 30.8 16
i8 17.2 12
i9 19.9 10
i10 9.96 12
i11 55.8 16
i12 25.2 20
i13 29.0 12
i14 85.5 16
i15 15.1 10
i16 28.5 18
i17 21.4 16
i18 17.7 20
i19 6.42 12
i20 84.9 16
;

parameter y(i),x(i);
y(i) = data(i,'y');
x(i) = data(i,'E');

scalar Var2Greene 'Variance estimate number two from Greene' / 46.16337 /;

variable b,L;
equation loglik;

loglik.. L =e= sum(i,log(b+x(i))) + sum(i,y(i)/(b+x(i)));


*
* estimation
*
model estimation /loglik/;
solve estimation minimizing L using nlp;
display "estimate",b.l;


*
* get hessian
*
option nlp=convert;
$onecho > convert.opt;
hessian
$offecho
estimation.optfile=1;
solve estimation minimizing L using nlp;

*
* gams cannot add elements at runtime so we declare the necessary elements here
*
set dummy /e1,x1/;

parameter h(*,*,*) '-hessian';
execute_load "hessian.gdx",h;
display h;

*
* inverting a 1x1 matrix is not very difficult
*
scalar v 'variance';
v = -1/h('e1','x1','x1');
display v,var2greene;

abort$(abs(v-var2greene)>1.0e-5) "Accuracy check failed";


Note: if you get a message
*** Msg=Error loading library c:\progra~1\gams\g2d: The specified module could not be found.
then copy gd.dll from the GAMS system directory to c:\progra~1\gams. This bug is already fixed.

Monday, July 28, 2008

GDX question

> I would like to make a gdx dump of all the sets/paramters/scalars of
> my model, but I do not want any equations or variables in there at
> all.


$version 227

Sets
i canning plants / seattle, san-diego /
j markets / new-york, chicago, topeka / ;

Parameters

a(i) capacity of plant i in cases
/ seattle 350
san-diego 600 /

b(j) demand at market j in cases
/ new-york 325
chicago 300
topeka 275 / ;

Table d(i,j) distance in thousands of miles
new-york chicago topeka
seattle 2.5 1.7 1.8
san-diego 2.5 1.8 1.4 ;

Scalar f freight in dollars per case per thousand miles /90/ ;

Parameter c(i,j) transport cost in thousands of dollars per case ;

c(i,j) = f * d(i,j) / 1000 ;

Variables
x(i,j) shipment quantities in cases
z total transportation costs in thousands of dollars ;

Positive Variable x ;

Equations
cost define objective function
supply(i) observe supply limit at plant i
demand(j) satisfy demand at market j ;

cost .. z =e= sum((i,j), c(i,j)*x(i,j)) ;

supply(i) .. sum(j, x(i,j)) =l= a(i) ;

demand(j) .. sum(i, x(i,j)) =g= b(j) ;

Model transport /all/ ;

Solve transport using lp minimizing z ;

Display x.l, x.m ;


* generate include file execute_unload statement
* that write all symbols except variables and
* equations
$onecho > f.awk
BEGIN {
print "execute_unload 'mygdx'" > "temp1.inc"
}
/^ *[0-9]+ / {
if ($4=="Equ") next;
if ($4=="Var") next;
print "," $2 > "temp1.inc"
}
END {
print ";" > "temp1.inc"
}
$offecho
$gdxout temp1
$unload
$gdxout
$call 'gdxdump temp1 symbols > temp1.txt'
$call 'awk -f f.awk temp1.txt'
$include temp1.inc

* mygdx.gdx will not contain any variables or equations
execute 'gdxdump mygdx symbols';


Basically at compile time we dump a gdx file temp1.gdx that has all symbols. Then we use an AWK scripts (still at GAMS compile time) to generate an include file temp1.inc that contains an execute_unload statement with all non var/equ symbols to mygdx.gdx. Then we $include this file. Finally for checking we call gdxdump (at execution time) to show all symbols in mygdx.gdx.

Thursday, July 24, 2008

gdx2xls does not work with 22.7

> Erwin. When tried to create xls file from gdx, I got an error message (see log file).
>*** Error at line 7379: Execute has nonzero return code RC=2

The GDX API has been changed between 22.5 and 22.6. You hare using 22.7 and your version of gdx2xls for <= 22.5. A work around is:


scalar s /3.14/;
execute_unload "s.gdx",s;
execute '=gdxcopy -v6c s.gdx oldgdx';
execute '=gdx2xls oldgdx\s.gdx';
execute '=shellexecute oldgdx\s.xls';.

This should show:
--- Job Untitled_39.gms Start 07/24/08 18:17:48
GAMS Rev 227 Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen G070509/0001CE-WIN
GAMS Development Corporation DC4572
--- Starting compilation
--- Untitled_39.gms(9) 2 Mb
--- Starting execution: elapsed 0:00:00.003
--- Untitled_39.gms(7) 3 Mb
Writing files to: .\.\oldgdx
Converting file: .\s.gdx
--- Untitled_39.gms(8) 3 Mb
GDX2XLS Version 3.0
Reading GDX file oldgdx\s.gdx
Reading 1 symbols, sorting: 0.00 seconds
Reading 0 UELs: 0.00 seconds
Opening C:\projects\test\oldgdx\s.XLS with Excel: 0.31 seconds
s. To scalar sheet
Total elapsed time: 0.72 seconds
--- Untitled_39.gms(9) 3 Mb
Shell Execute May 1, 2008 22.7.2 WIN 3727.4799 VIS x86/MS Windows
--- Untitled_39.gms(9) 3 Mb
*** Status: Normal completion
--- Job Untitled_39.gms Stop 07/24/08 18:19:49 elapsed 0:02:00.857


Update: unfortunately this worked for me, but not for the user. I don't understand why. Contact support@gams.com and ask for a 22.7 version of gdx2xls.exe.

gdxcopy puzzle

This does not look right:

c:\projects\test>gdxcopy -v6c save\s.gdx oldgdx
Writing files to: save\save\oldgdx
Converting file: save\s.gdx


Instead of writing to .\oldgdx or save\oldgdx it goes to save\save\oldgdx.

Update: this is fixed in 22.8 beta. Minor issue: the log messages have disappeared in the 22.8 version.


c:\projects\test>gdxcopy -v6c save\s.gdx oldgdx

c:\projects\test>


I would prefer to see what gdxcopy is doing like in the previous version.

Modeling question

>Among the constraints in the formulation my team have worked on,
>some of them have special properties like:
>x1>=a1
>x1+x2>=a1+a2
>x1+x2+x3>=a1+a2+a3
>...
>...
>x1+x2+x3+...+x39>=a1+a2+a3+...+a39
>
>Here a[i] are nonnegative parameters, so RHSs are non-decreasing.
>Besides these constraints, we deal with tons of constraints, so we are not looking
>for any algorithm or method, but looking for smart conversion of these constraints.
>My question is "Are they any ways to formulate this more intellingently or effectively
>to reduce computational effort?


parameter b(i);
alias (i,j);
b(i) = sum(j$(ord(j)<=ord(i)), a(j));

variable y(i);

ydef(i).. y(i) =e= y(i-1) + x(i);
y.lo(i) = b(i);

This requires more variables but many fewer nonzeroes, so likely more efficient (sparser).

Excel GUI for GIS and GAMS


This is work in progress: Excel as front-end for a GAMS Model, including a GIS component. Click to enlarge.

Tuesday, July 22, 2008

Modeling question

>I need to change the following non linear constraint to a linear one .
>the constraints are:
>
>for all k = 0 ,1, 2,3 ,4,...... 10
>
>k+1 = sum over j ( a[k][j] * d[j] )
>
>j = 1,2,3,4
>
>for all k, j a[k][j] is an integer <= 2 and d[j] is an integer <= 60. >
>both a[k][j] and d[j] are variables , making it a non linear
>constraint. Is there a method i can change it to a linear expression.
>Cplex gives a error with this constraint.


These question can only be answered more definitely by looking at the complete model, but let me give it a shot anyway.

The a(k,j) variables can only assume the values 0,1,2. So we can write this as a sum of 2 binary variables (and relax a to be continuous):

0 <= a(k,j) <= 2 continuous
b1(k,j), b2(k,j) binary
a(k,j) = b1(k,j)+2*b2(k,j);

Now we can express z(k,j) = a(k,j)*d(j) as

M1=20
z1(k,j) ≤ M1 · b1(k,j)
z1(k,j) ≤ d(j)
z1(k,j) ≥ d(j) − M1 · (1 − b1(k,j))
M2=40
z2(k,j) ≤ M2 · b2(k,j)
z2(k,j) ≤ 2*d(j)
z2(k,j) ≥ 2*d(j) − M2 · (1 − b2(k,j))
k + 1 = sum(j, z1(k,j)+z2(k,j));

See also: http://yetanothermathprogrammingconsultant.blogspot.com/2008/05/multiplication-of-continuous-and-binary.html.

Invert.exe raw mode gdx

Using raw mode to write the inverse (instead of string mode) yields:

02.416 seconds (n=1000)

The GAMS version should get similar performance (may be even faster as I did not use any compiler flags to do better optimization).

Saturday, July 19, 2008

More on Invert performance

The INVERT.EXE as compiled by GAMS and included in the 22.8 beta is somewhat slower than my own version. Here are some statistics:


                                n=1000
2.28 32bit 0:00:06.973
2.28 64bit 0:00:06.705
my version (32 bit) 0:00:03.878 (gams part is 2.27 64 bit)



The margin is larger than I expected (I expected similar performance). Note: the above timings include GAMS times (total elapsed time of the whole run), so they understate the difference in INVERT.EXE performance. When timing invert.exe itself we see a speed difference of a factor two. I would not blink an eye when I see a 10% difference, but 100% difference indicates there must be something going on here. These timings are based on 22.8 beta, so we still have some time (until the final is out) to fix this.

Invert.exe performance

This model shows that my external solver INVERT.EXE is much faster than Cplex in inverting dense matrices. Obviously this is a case where INVERT is at its best, and Cplex is at a disadvantage, so this is not a poor performance of Cplex. These models are just not suited for a sparse LP solver.

Here is the model:

$ontext

Invert the Pei test matrix

Erwin Kalvelagen, july 2008

Command line arguments:
--n=integer order of the matrix
--alpha=real scalar defining the Pei matrix
--method=1 invert using equations A AINV = I
--method=2 invert using equations A AINV = I using advanced basis
--method=3 invert using external solver


Reference:

Erwin Kalvelagen
SOLVING SYSTEMS OF LINEAR EQUATIONS WITH GAMS
http://www.amsterdamoptimization.com/pdf/lineq.pdf

ML Pei,
A test matrix for inversion procedures,
Communications of the ACM,
Volume 5, 1962, page 508.

$offtext

*
* defaults
*
$if not set n $set n 100
$if not set method $set method 1
$if not set alpha $set alpha 1

scalar alpha /%alpha%/;

set i /i1*i%n%/;
alias (i,j,k);

parameter a(i,j);
a(i,j)=1;
a(i,i)=1+alpha;

parameter ident(i,j);
ident(i,i)=1;

variables
ainv(i,j)
dummy
;

equations
e(i,j)
edummy
;

edummy.. dummy =e= 0;
e(i,j).. sum(k, a(i,k)*ainv(k,j)) =e= ident(i,j);

model m /all/;

option iterlim=1000000;



$if "%method%"==1 solve m minimizing dummy using lp;

$ifthen "%method%"==2

edummy.m=1;
e.m(i,j)=1;
dummy.m=0;
ainv.m(i,j)=0;
solve m minimizing dummy using lp;

$endif

$ifthen "%method%"==3

execute_unload 'a.gdx',i,a;
execute '=invert.exe a.gdx i a b.gdx pinva';
parameter pinva(i,j);
execute_load 'b.gdx',pinva;

$endif


We have implemented three methods of calculating the inverse.

  1. Method=1 uses the LP solver to solve A * invA = I.
  2. Method=2 is as method=1 but we use an advanced basis to speed up the solver.
  3. Method=3 is calling the external solver INVERT.EXE

For more information see: http://www.amsterdamoptimization.com/pdf/lineq.pdf.

Some timings are listed below (all times in seconds):



             method=1   method=2   method=3
n=50 0.637 0.433 0.027
n=100 8.267 4.036 0.055
n=200 313.236 53.395 0.118


The timings are taken from the last line in the log file. E.g.:

--- Job Untitled_36.gms Start 07/19/08 15:46:36
GAMS Rev 227 Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen G070509/0001CE-WIN
GAMS Development Corporation DC4572
--- Starting compilation
--- Untitled_36.gms(81) 3 Mb
--- Starting execution: elapsed 0:00:00.004
--- Untitled_36.gms(69) 5 Mb
----------------------------------------
INVERT: matrix inversion
Erwin Kalvelagen, Amsterdam Optimization
----------------------------------------
DLL:GDX Library May 1, 2008 22.7.2 WIN 4701.4799 VIS x86/MS Windows
GDXin:a.gdx
Input set:i
Input parameter:a
LAPACK DGESV, n= 200
GDXout:b.gdx
Output parameter:pinva
Done
--- Untitled_36.gms(71) 5 Mb
--- GDXin=C:\projects\test\b.gdx
--- Untitled_36.gms(71) 6 Mb
*** Status: Normal completion
--- Job Untitled_36.gms Stop 07/19/08 15:46:36 elapsed 0:00:00.118

Friday, July 18, 2008

LS Solver

The log of the LS solver as distributed in 22.8 beta is slightly different from the original version I compiled myself.

Gams version based on my donated sources:

=============================================================
Least Squares Solver
=============================================================

Parameter Estimate Std. Error t value Pr(>|t|)
b( const) -3.48226E+06 8.90420E+05 -3.91080E+00 3.56040E-03 **
b( gnpdefl) 1.50619E+01 8.49149E+01 1.77376E-01 8.63140E-01
b( gnp) -3.58192E-02 3.34910E-02 -1.06952E+00 3.12681E-01
b( unempl) -2.02023E+00 4.88400E-01 -4.13643E+00 2.53509E-03 **
b( army) -1.03323E+00 2.14274E-01 -4.82199E+00 9.44367E-04 ***
b( pop) -5.11041E-02 2.26073E-01 -2.26051E-01 8.26212E-01
b( year) 1.82915E+03 4.55478E+02 4.01589E+00 3.03680E-03 **
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Estimation statistics:
Cases: 16 Parameters: 7 Residual sum of squares: 8.36424E+05
Residual standard error: 3.04854E+02 on 9 degrees of freedom
Multiple R-squared: 9.95479E-01 Adjusted R-squared: 9.92465E-01
F statistic: 3.30285E+02 on 6 and 9 DF, p-value: 4.98403E-10

GDX library version: GDX Library BETA 14Jul08 22.8.0 WIN 5737.5744 VIS x86
/MS Windows
GDX file: ls.gdx


My original version:

=======================================================================
Least Square Solver V2
Erwin Kalvelagen, Amsterdam Optimization Modeling Group
www.amsterdamoptimization.com
=======================================================================

Parameter Estimate Std. Error t value Pr(>|t|)
b('const') -3.48226E+06 8.90420E+05 -3.91080E+00 3.56040E-03 **
b('gnpdefl') 1.50619E+01 8.49149E+01 1.77376E-01 8.63141E-01
b('gnp') -3.58192E-02 3.34910E-02 -1.06952E+00 3.12681E-01
b('unempl') -2.02023E+00 4.88400E-01 -4.13643E+00 2.53509E-03 **
b('army') -1.03323E+00 2.14274E-01 -4.82199E+00 9.44367E-04 ***
b('pop') -5.11041E-02 2.26073E-01 -2.26051E-01 8.26212E-01
b('year') 1.82915E+03 4.55478E+02 4.01589E+00 3.03680E-03 **
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Estimation statistics:
Cases: 16 Parameters: 7 Residual sum of squares: 8.36424E+05
Residual standard error: 3.04854E+02 on 9 degrees of freedom
Multiple R-squared: 9.95479E-01 Adjusted R-squared: 9.92465E-01
F statistic: 3.30285E+02 on 6 and 9 DF, p-value: 4.98403E-10

DLL version: _GAMS_GDX_237_2007-01-09
GDX file: ls.gdx



The labels seem to have lost their quotation marks. I hoped my code was easily portable, but this may indicate a weakness (although I don't see immediately what I did wrong here). Most importantly the numerical results are the same, so that is good. Actually my banner has "least square" which should read "least squares" (surely in most cases we have multiple squared errors).

Lags and day/hour time representation

> I have variables with two indices, e.g. x(d,h), where d and h are sets
> of days and hours.
>
> My problem is to make a predecessor/sucessor relation between those x
> variables. E.g. I have to model the difference between two neighboring
> x variables like:
>
> delta(d,h) =e= x(d,h) - x(pred(d,h)) ;
>
> delta(d,g) =l= constant ;
>
> Are there any ideas how to make this in an easy and elegant way?


In general it is much easier to work with one time index. I.e. use a single index t instead of two indices (d,h). This will make your model much simpler and easier to maintain.

If you want to print in the final reports in terms of (d,h) then use:


sets
d / d1*d10 /
h / h1*h24 /
t / t1 * t240 /
;


variable delta(t),x(t);
equation e(t);
e(t).. delta(t)$(ord(t)>1) =e= x(t) - x(t-1) ;


set mapper(d,h,t);
mapper(d,h,t)$([ord(d)-1]*card(h)+ord(h) = ord(t)) = yes;
display mapper;

x.l(t) = uniform(0,1);
parameter xreport(d,h);
xreport(d,h) = sum(mapper(d,h,t),x.l(t));
display xreport;

Wednesday, July 16, 2008

GAMS log cosmetics

C:\projects\test>touch r.gms

C:\projects\test>type r.gms

C:\projects\test>gamslib trnsport
Model trnsport.gms retrieved

C:\projects\test>gams trnsport a=c s=t1
--- Job trnsport Start 07/16/08 20:27:27
GAMS Rev 227 Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen G070509/0001CE-WIN
GAMS Development Corporation DC4572
--- Starting compilation
--- trnsport.gms(69) 3 Mb
*** Status: Normal completion
--- Job trnsport.gms Stop 07/16/08 20:27:27 elapsed 0:00:00.006

C:\projects\test>gams r r=t1
--- Job r Start 07/16/08 20:27:43
GAMS Rev 227 Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen G070509/0001CE-WIN
GAMS Development Corporation DC4572
--- Starting continued compilation
--- Starting execution: elapsed 0:00:00.004
--- r.gms(45) 3 Mb
--- Generating LP model transport
--- r.gms(66) 3 Mb
--- 6 rows 7 columns 19 non-zeroes
--- r.gms(66) 3 Mb
--- Executing CPLEX: elapsed 0:00:00.010

ILOG CPLEX May 1, 2008 22.7.2 WEX 4792.4799 WEI x86_64/MS Windows
Cplex 11.0.1, GAMS Link 34

Reading data...
Starting Cplex...
Tried aggregator 1 time.
LP Presolve eliminated 1 rows and 1 columns.
Reduced LP has 5 rows, 6 columns, and 12 nonzeros.
Presolve time = 0.00 sec.

Iteration Dual Objective In Variable Out Variable
1 73.125000 x(seattle.new-york) demand(new-york) slack
2 119.025000 x(seattle.chicago) demand(chicago) slack
3 153.675000 x(san-diego.topeka) demand(topeka) slack
4 153.675000 x(san-diego.new-york) supply(seattle) slack

Optimal solution found.
Objective : 153.675000

--- Restarting execution
--- r.gms(66) 2 Mb
--- Reading solution for model transport
--- r.gms(68) 3 Mb
*** Status: Normal completion
--- Job r.gms Stop 07/16/08 20:27:43 elapsed 0:00:00.079

C:\projects\test>

In this example we do the following:

  1. create an empty gams file r.gms
  2. compile trnsport.gms and save as t1
  3. restart empty file from t1

In this case the execution time line numbering refers to the wrong file. I.e. r.gms(45) refers to line 45 of the file r.gms, but that file has zero lines. I sent this to GAMS and they tell me execution errors have the correct file/line printed.

GAMS/IDE wish: search with regexp in .log/.lst

I would like to be able to search .lst and .log files using regular expressions. Currently the file viewer (the software that shows .lst and .log files in read-only mode in an IDE window) only allow normal search.

This is a somewhat unusual case. There are many solves in this model (thousands) and locate the correct solve I want to search for the second loop of "region1". In normal search region10,region11,.. and region100,region101,... also match. A quick way to search would be to search for "region1$" where $ means end-of-line. This requires regular expression (regexp for nerds) search. This is not (yet) available in the file viewers only in the editor. So the work-around suggested is to open the file in the editor (FileOpen in editor).

Suggestion: either implement regexp search for the file viewers or disable this choice for the file viewers (currently it is always enabled).

This is somewhat exotic: for "normal" models the log file is small.

Saturday, July 12, 2008

GAMS 22.8 issues

The new 22.8 beta is out (see http://www.gams.com/). Here are some initial experiences:

  1. I made LS (regression solver) and INVERT (matrix inversion program) available for distribution. It is included but has a few minor issues.
  2. I was hoping that the Cplex interface would allow lazy constraints to be expressed in GAMS models.
  3. Mosek does not crash anymore on some of MIQPs that it does not like. The message is not very user-friendly: Return code - 1500 [MSK_RES_ERR_INV_PROBLEM]. May be in a future release more readable texts will be added.
  4. Mosek still has problems with the model in http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/erratic-miqp-behavior.html. It gives the obscure message: Return code - 1050 [MSK_RES_ERR_UNKNOWN]. This will probably be fixed before the final release. (Update: bug still present in final 22.8).
  5. Some of the IDE issues have not been fixed: http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/gamside-bug_06.html, http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/gamside-bug.html Admittedly they are not very important. (Although the Ctrl-T issue hits me quite often).
  6. This has been fixed: http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/gams-bug-interrupt-does-not-work.html. It is now possible to interrupt a running GAMS model. This will help me a lot.
  7. Qmaker and Cplex QCP interface is still very slow. I passed on a test case. (Update: the cplex QCP interface will be fixed for 22.8).
  8. MIP interrupt in Mosek is largely fixed but the results are not always convincing:
    **** SOLVER STATUS 8 USER INTERRUPT
    **** MODEL STATUS 1 OPTIMAL
    That combination looks suspicious. (This happens only occasionally, probably when hitting Ctrl-C near the end. In most cases it seems to work just fine.).
  9. Tools like gdxcopy, gdx2xls have lost their log. They used write progress messages to stdout with useful info what is happening. These messages are removed.

Friday, July 11, 2008

GAMS 64 bit installer issue


> Update: this has been fixed in 22.9.


The 64 bit install of GAMS installs by default to C:\Program Files (x86)\gams22.8. This directory is for 32 bit software. 64 bit apps should be installed in C:\Program Files\gams22.8.
I also get with 32 bit and 64 bit install a message like this:
Looks dangerous, but if I say "This program installed correctly" it seems to be ok.

Thursday, July 10, 2008

Performance improvement


Click to enlarge. See what performance improvements in hardware and software can do for a practical LP model. You can try it yourself by downloading and running http://www.gams.com/modlib/libhtml/indus89.htm.

Wednesday, July 9, 2008

Find the bug

Q: locate the error in this GAMS code:


  19   
20
21 Sets
22 i canning, plants / seattle, san-diego /
23 j markets / new-york, chicago, topeka / ;
24
25 Parameters
26
27 a(i) capacity of plant i in cases
28 / seattle 350
**** $361
**** 361 Values for domain 1 are unknown - no checking possible
29 san-diego 600 /
30
31 b(j) demand at market j in cases
32 / new-york 325
33 chicago 300
34 topeka 275 / ;

Ordering of set elements in GAMS

> I don't understand the ordering of set elements
> in my display of a set.


If you look at:

set
i /a,c/
j /b,c/
;
display j;

then this will display:

---- 5 SET j
c, b


To understand this you'll need a little bit of knowledge how GAMS stores and orders set elements. Set elements are unique and stored in single large pool. They are ordered as they come. To display the pool (a.k.a universe) you can use:

alias (pool,*);
display pool;

This will show:

---- 8 SET pool Aliased with *
a, c, b

The ordering of elements (e.g. in display statements) is determined how they are ordered in the pool. It is also important in determining if a set is ordered (in that case the set follows the same ordering as the universe).

This can lead to surprises:

set
t1 /2001*2005/
t2 /2000*2006/;
display t2;

shows:

---- 5 SET t2
2001, 2002, 2003, 2004, 2005, 2000, 2006


A quick fix is to start with a dummy set to get the ordering correct:

set
dummy /2000*2010/
t1 /2001*2005/
t2 /2000*2006/;
display t2;

Monday, July 7, 2008

Bug in GAMS/Cplex link

GAMS/Cplex destroys here a perfectly good integer solution in the final LP. In that final LP the integers are fixed and the resulting LP is solved to get some dual information. In this case a resource limit is hit during this final LP and all is thrown away. Of course this is not what we want to see. The code in the GAMS/Cplex link related to this final LP solve need some careful inspection as we have seen problems with this part before.

Repeating presolve.
Tried aggregator 2 times.
MIP Presolve eliminated 1679 rows and 1752 columns.
MIP Presolve modified 4 coefficients.
Aggregator did 23 substitutions.
Reduced MIP has 184571 rows, 180577 columns, and 758788 nonzeros.
Represolve time = 1.63 sec.
Clique table members: 37.
MIP emphasis: optimality.
MIP search method: dynamic search.
Parallel mode: opportunistic, using up to 4 threads.
Root relaxation solution time = 137.66 sec.

Nodes Cuts/
Node Left Objective IInf Best Integer Best Node ItCnt Gap

0 0 14462.2442 195 14527.9285 14462.2442 178843 0.45%
Heuristic still looking.
Heuristic still looking.
* 0+ 0 14525.7851 14462.2442 178843 0.44%
0 2 14462.2442 195 14525.7851 14462.2442 178843 0.44%
Elapsed real time = 4637.89 sec. (tree size = 0.00 MB, solutions = 8)
100 99 14479.0518 197 14525.7851 14463.9712 232491 0.43%
Cuts: 15
200 197 14490.7443 197 14525.7851 14463.9712 277649 0.43%
* 295+ 286 14525.5834 14463.9712 344217 0.42%
300 295 14505.3123 147 14525.5834 14463.9712 336888 0.42%
400 395 14514.1868 145 14525.5834 14463.9712 368746 0.42%
* 433+ 408 14525.5412 14463.9712 378611 0.42%
500 476 14494.6548 134 14525.5412 14463.9712 420167 0.42%
Impl Bds: 1
* 544+ 497 14515.8473 14463.9712 437352 0.36%
* 545+ 498 14515.3114 14465.4987 441214 0.34%
Fixing integer variables, and solving final LP...
Tried aggregator 1 time.
LP Presolve eliminated 188052 rows and 205219 columns.
Aggregator did 6758 substitutions.
Reduced LP has 157128 rows, 152946 columns, and 633350 nonzeros.
Presolve time = 1.13 sec.
Initializing dual steep norms . . .

Iteration log . . .
Iteration: 1 Dual objective = 6057.310401
Perturbation started.
Iteration: 102 Dual objective = 6057.310401
Iteration: 878 Dual objective = 7666.791470
Iteration: 1500 Dual objective = 9296.038149
Iteration: 2211 Dual objective = 9824.349977
Iteration: 2889 Dual objective = 10466.849833
Iteration: 3542 Dual objective = 10641.384471
Iteration: 4170 Dual objective = 10892.599730
Iteration: 4795 Dual objective = 10929.911432
Iteration: 5399 Dual objective = 11011.407094
Iteration: 6008 Dual objective = 11026.152235
Iteration: 6591 Dual objective = 11028.613181
Iteration: 7181 Dual objective = 11037.046345
Iteration: 7736 Dual objective = 11039.881723
Iteration: 8318 Dual objective = 11042.152553
Iteration: 8957 Dual objective = 11044.445128
Iteration: 9519 Dual objective = 11046.434903
Iteration: 10119 Dual objective = 11048.475834
Iteration: 10685 Dual objective = 11054.973343
Iteration: 11233 Dual objective = 11057.071503
Iteration: 11774 Dual objective = 11059.407174
Iteration: 12324 Dual objective = 11062.135289
Iteration: 12869 Dual objective = 11064.387325
Iteration: 13434 Dual objective = 11066.637098
Iteration: 14031 Dual objective = 11074.488101
Iteration: 14579 Dual objective = 11076.981695
Iteration: 15089 Dual objective = 11079.444173
Iteration: 15621 Dual objective = 11082.365429
Iteration: 16145 Dual objective = 11084.623184
Iteration: 16667 Dual objective = 11087.253554
Iteration: 17199 Dual objective = 11091.205312
Iteration: 17715 Dual objective = 11098.091964
Iteration: 18262 Dual objective = 11102.884193
Iteration: 18788 Dual objective = 11106.328123
Iteration: 19295 Dual objective = 11109.503749
Iteration: 19815 Dual objective = 11114.531394
Iteration: 20324 Dual objective = 11117.863392
Iteration: 20829 Dual objective = 11120.704519
Iteration: 21333 Dual objective = 11123.060983
Iteration: 21830 Dual objective = 11128.551740
Iteration: 22365 Dual objective = 11134.127949
Iteration: 22899 Dual objective = 11137.954843
Iteration: 23365 Dual objective = 11141.672390
Iteration: 23912 Dual objective = 11149.342966
Iteration: 24419 Dual objective = 11158.316182
Iteration: 24957 Dual objective = 11179.374543
Iteration: 25473 Dual objective = 11188.140937
Iteration: 26019 Dual objective = 11193.656639
Iteration: 26520 Dual objective = 11198.854075
Iteration: 27113 Dual objective = 11209.155858
Iteration: 27664 Dual objective = 11220.384286
Iteration: 28172 Dual objective = 11228.944391
Iteration: 28665 Dual objective = 11240.074822
Iteration: 29163 Dual objective = 11246.180510
Iteration: 29682 Dual objective = 11257.264795
Iteration: 30218 Dual objective = 11275.780600
Iteration: 30745 Dual objective = 11286.175702
Iteration: 31223 Dual objective = 11293.826527
Iteration: 31706 Dual objective = 11322.912597
Iteration: 32242 Dual objective = 11345.656014
Iteration: 32792 Dual objective = 11355.546099
Elapsed time = 10.16 sec. (33000 iterations).
Iteration: 33340 Dual objective = 11365.028247
Iteration: 33859 Dual objective = 11378.128137
Iteration: 34425 Dual objective = 11384.719065
Iteration: 35106 Dual objective = 11400.809173
Iteration: 35714 Dual objective = 11410.512205
Iteration: 36277 Dual objective = 11418.067340
Iteration: 36899 Dual objective = 11440.861112
Iteration: 37511 Dual objective = 11449.609068
Iteration: 38161 Dual objective = 11461.996251
Iteration: 38848 Dual objective = 11466.467047
Iteration: 39434 Dual objective = 11479.910077
Iteration: 40002 Dual objective = 11493.030687
Iteration: 40642 Dual objective = 11500.724057
Iteration: 41217 Dual objective = 11521.687737
Iteration: 41825 Dual objective = 11532.637059
Iteration: 42526 Dual objective = 11536.771010
Iteration: 43171 Dual objective = 11541.448080
Iteration: 43797 Dual objective = 11552.971322
Iteration: 44466 Dual objective = 11565.311371
Iteration: 45131 Dual objective = 11569.962544
Iteration: 45787 Dual objective = 11574.606053
Iteration: 46461 Dual objective = 11578.764831
Iteration: 47118 Dual objective = 11588.017836
Iteration: 47736 Dual objective = 11618.652276
Iteration: 48420 Dual objective = 11627.910620
Iteration: 49100 Dual objective = 11639.618577
Iteration: 49705 Dual objective = 11709.154815
Iteration: 50257 Dual objective = 11724.330084
Iteration: 50890 Dual objective = 11731.428433
Iteration: 51600 Dual objective = 11736.597227
Iteration: 52294 Dual objective = 11744.978937
Iteration: 52895 Dual objective = 11753.709220
Iteration: 53467 Dual objective = 11765.418835
Iteration: 54163 Dual objective = 11773.818329
Iteration: 54869 Dual objective = 11780.943803
Iteration: 55550 Dual objective = 11786.399499
Iteration: 56219 Dual objective = 11792.327244
Iteration: 56894 Dual objective = 11796.992322
Iteration: 57548 Dual objective = 11800.527954
Iteration: 58177 Dual objective = 11825.388844
Iteration: 58846 Dual objective = 11838.119491
Iteration: 59544 Dual objective = 11853.442610
Iteration: 60147 Dual objective = 11864.848136
Iteration: 60781 Dual objective = 11891.687076
Iteration: 61425 Dual objective = 11906.740461
Iteration: 62087 Dual objective = 11940.561093
Iteration: 62737 Dual objective = 11975.219161
Elapsed time = 20.33 sec. (63000 iterations).
Iteration: 63454 Dual objective = 12002.296415
Iteration: 64108 Dual objective = 12015.440378
Iteration: 64818 Dual objective = 12022.395479
Iteration: 65509 Dual objective = 12029.058176
Iteration: 66219 Dual objective = 12037.336718
Iteration: 66913 Dual objective = 12048.634801
Iteration: 67550 Dual objective = 12098.951656
Iteration: 68233 Dual objective = 12148.667809
Iteration: 68861 Dual objective = 12204.554100
Iteration: 69489 Dual objective = 12260.068947
Iteration: 70151 Dual objective = 12271.769079
Iteration: 71067 Dual objective = 12340.904160
Iteration: 71916 Dual objective = 12469.081691
Iteration: 72832 Dual objective = 12538.397179
Iteration: 73689 Dual objective = 12573.657042
Iteration: 74580 Dual objective = 12633.804960
Iteration: 75466 Dual objective = 12671.034083
Iteration: 76282 Dual objective = 12709.759093
Iteration: 77157 Dual objective = 12731.565162
Iteration: 78006 Dual objective = 12791.001813
Iteration: 78861 Dual objective = 12826.806885
Iteration: 79716 Dual objective = 12862.410390
Iteration: 80579 Dual objective = 12894.168594
Iteration: 81454 Dual objective = 12987.178433
Iteration: 82296 Dual objective = 13103.926493
Iteration: 83211 Dual objective = 13321.544934
Iteration: 84091 Dual objective = 13443.441166
Iteration: 84973 Dual objective = 13562.343061
Iteration: 85890 Dual objective = 13644.284360
Iteration: 86698 Dual objective = 13720.220758
Iteration: 87401 Dual objective = 13816.562168
Iteration: 88221 Dual objective = 13881.829495
Iteration: 88849 Dual objective = 13929.142875
Iteration: 89610 Dual objective = 13984.379103
Iteration: 90422 Dual objective = 14047.173914
Iteration: 91270 Dual objective = 14115.140590
Iteration: 92125 Dual objective = 14171.749778
Iteration: 93089 Dual objective = 14217.148596
Iteration: 93982 Dual objective = 14263.230690
Elapsed time = 30.39 sec. (94000 iterations).
Iteration: 94826 Dual objective = 14320.598940
Iteration: 95556 Dual objective = 14345.861439
Iteration: 96804 Dual objective = 14379.439770
Iteration: 97540 Dual objective = 14399.637101
Iteration: 98353 Dual objective = 14415.843344
Iteration: 99135 Dual objective = 14486.973327
Iteration: 100230 Dual objective = 14507.696311
Iteration: 101753 Dual objective = 14524.245965
Iteration: 102889 Dual objective = 14536.589082
Iteration: 104245 Dual objective = 14546.083262
Iteration: 105151 Dual objective = 14553.547999
Iteration: 106121 Dual objective = 14558.473735
Iteration: 107058 Dual objective = 14562.528434
Iteration: 107846 Dual objective = 14564.898252
Iteration: 108562 Dual objective = 14568.707843
Iteration: 109171 Dual objective = 14569.741540
Iteration: 110267 Dual objective = 14570.230927
Iteration: 111442 Dual objective = 14570.636373
Iteration: 113264 Dual objective = 14570.833100
Iteration: 115164 Dual objective = 14571.039823
Removing perturbation.
Iteration: 115366 Scaled dual infeas = 0.068957
Iteration: 118957 Scaled dual infeas = 0.000000
Iteration: 119095 Dual objective = 14519.183719
Iteration: 119885 Dual objective = 14526.787115
Elapsed time = 40.78 sec. (120000 iterations).
Iteration: 121024 Dual objective = 14527.291310
Iteration: 121967 Dual objective = 14527.804039
Iteration: 122614 Dual objective = 14527.921795
Removing shift (76).
Iteration: 122721 Scaled dual infeas = 0.000158
Iteration: 122814 Dual objective = 14527.911155
Removing shift (3).

Resource limit exceeded.

MIP Solution: 14527.928528 (0 iterations, 0 nodes)
Final Solve: 14527.928596 (122973 iterations)

Best possible: -inf
Absolute gap: na
Relative gap: na



A workaround is to increase reslim (so we have enough time to finish the final solve) or to decrease reslim (so we stop on time and allocate extra time to do the final solve). Note: as the duals are being used, it was no option to skip the final solve using option solvefinal.


Update: the GAMS people say the reported solution (primal+dual) should be ok.
I only have the log file, so I don't have evidence otherwise.

Parallel processing of large models

> It might be desirable to invoke parallel processing of my optimization model
> on a network of PCs.

The simplest way to use parallel processing is to use a parallel solver such as Cplex, Xpress or Mosek and tell it to use multiple threads. E.g. with Cplex say threads 4 in an option file and it will exploit your quad core machine. Be aware that parallel processing is not always faster and that you need extra memory. This option only works on SMP machines (basically computers with multiple CPUs/cores).

If I remember correctly Xpress had once a version that worked on a network of machines using PVM. I don't think this was ever available under GAMS. GAMS/Xpress can use multiple threads, but again as SMP only.

GAMS has some grid facilities and the GAMS people have been pushing the Sun grid facility. In my experience this is all somewhat premature. The GAMS grid stuff is fragile and the Sun grid facility looks primitive. It would require quite some work to make this more user friendly. In addition, you need to make sure your model is suited for this type of architecture. The problem must be decomposable in a number of independent jobs, otherwise this approach won't work. Once you get this to work, in many cases you will find out that performance is often disappointing compared to a serial implementation on a fast PC.

Note that GAMS itself is single-threaded. I am not aware of a parallel modeling language although this would make sense for some classes of models. The block structure could be exploited for coarse-grained parallellism. The GAMS internals are very messy so I don't think this will be an option that will be added quickly.

How to call GAMS from Access?

Short Answer: invoke gams.exe using the CreateProcess API call. In a practical application much more is needed:
  1. Create a button so user can launch GAMS
  2. Find the location where GAMS.EXE is located
  3. Create unique directory where scratch files can be read and written
  4. Provide a mechanism to write problem data to a GAMS file
  5. Call GAMS, capture log output and show in window, if needed add an interrupt button to stop the solution process
  6. Provide a mechanism to read solution data from the GAMS run
  7. Presentation of solution
Below is an example: it solves the Traveling Salesman Problem or Minimum Spanning Tree Problem against a 42 US city data set. The TSP is solved using a cutting place algorithm coded in GAMS. All the steps described above are coded in VBA. Click to enlarge.


You'll see four windows inside Access. The first window contains some buttons that allow the user to drive the application. The log window shows any progress by GAMS. It also has some buttons to allow the user to interrupt the solution process. The graph is a simple way to display the solution. The right-lower window shows a data table with city related information (name, coordinates etc.)

Saturday, July 5, 2008

Bootstrap code for forming confidence intervals in Max Entropy estimation

Maximum entropy estimation has become an important tool when few observations are available. Some references are:

Confidence intervals can be formed using the bootstrap or resampling approach. The percentile method is used to estimate the confidence intervals. The model below illustrates how this can be implemented in GAMS.


$ontext

   Bootstrap GME model
 
   Copyright 2007 Erwin Kalvelagen, Amsterdam Optimization
 
   References:
     [1] Maximum entropy estimation in economic models with linear
         inequality restrictions, Randall C. Campbell, R. Carter Hill
         Department of Economics, Louisiana State University,
         Baton Rouge, LA 70803,USA, 2001
 
     [2] Ramanathan, R., 2002. Introductory econometrics with applications
         (Harcourt College Publishers, Fort Worth).
 
     [3] Bardley Efron, Robert J. Tibshirani, "An Introduction to the Bootstrap",
         Chapman & Hall, 1993
 
   Files:
     bootstrap-gme.gms    -- gams file
     poverty.inc          -- data file
     tc.inc (optional)    -- file with quantiles for Student t distribution
 
 
$offtext
 
 
set i /case1*case116/
    k0 /
        constant 'constant term in regression'
        povrate  'poverty level'
        urb      'not used'
        famsize  'average household size'
        unemp    'percentage unemployment rate'
        highschl 'percentage of population age 25 and over with high school degree only'
        college  'percentage of population age 25 and over that completed 4 or more years of college'
        medinc   'median household income'
        D90      'dummy: one for the 1990 Census and zero for the 1980 Census'
      /
   k1(k0) 'data used' /povrate,famsize,unemp,highschl,college,medinc,D90/
;
 
*------------------------------------------------------------------------------
* input data from Ramanthan (2002, Data 7-6, p. 653)
*------------------------------------------------------------------------------
 
table data(i,k0) '-- input data from Ramanthan [2] (2002, Data 7-6, p. 653)'
$include poverty.inc
;
display data;
 
 
*------------------------------------------------------------------------------
* critical values t distribution
* this table is available from amsterdamoptimization.com
* if not available you can use a normal approximation
*------------------------------------------------------------------------------
 
$if exist tc.inc $include tc.inc
$if defined tc $goto tcok
 
 
* we have no access to tc.inc so we use a normal approximation
 
set prob /p1,p2,p3,p4,p5,p6/;
parameter probval(prob) /
  p1 0.10,  p2 0.05,  p3 0.025,  p4 0.01,  p5 0.005,  p6 0.001
/;
 
parameter qnorm(prob) /
  p1 1.281552, p2 1.644854, p3 1.959964, p4 2.326348, p5 2.575829, p6 3.090232
/;
 
$label tcok
 
 
*------------------------------------------------------------------------------
* create some summary statistics
*------------------------------------------------------------------------------
 
parameter summary(k1,*) '-- summary statistics (table 1 in [1])';
summary(k1,'mean') = sum(i,data(i,k1))/card(i);
summary(k1,'min') = smin(i,data(i,k1));
summary(k1,'max') = smax(i,data(i,k1));
summary(k1,'stdev') = sqrt[sum(i,sqr[data(i,k1)-summary(k1,'mean')])/(card(i)-1)];
summary(k1,'coeffvar') = summary(k1,'stdev')/summary(k1,'mean');
display summary;
 
*------------------------------------------------------------------------------
* linear regression using OLS
*------------------------------------------------------------------------------
 
set k(k0) 'regression coefficients'
    /constant,famsize,unemp,highschl,college,medinc,D90/;
data(i,'constant') = 1;
 
variables
  beta(k)  'regression coefficients'
  sse      'sum of squared errors'
;
equations
  dummyobj  'dummy objective'
  fit(i)    'fit linear equations'
;
 
dummyobj..   sse =n= 0;
fit(i)..     data(i,'povrate') =e=  sum(k, beta(k)*data(i,k));
 
model ols /dummyobj,fit/;
 
option lp=ls;
ols.limrow = 0;
ols.limcol = 0;
ols.solprint = 0;
solve ols using lp minimizing sse;
 
* Note: compare OLS results to page 8 in [1]
 
parameter estim(k,*,*) 'estimates';
estim(k,'OLS','-') = beta.l(k);
 
*------------------------------------------------------------------------------
* Calculate OLS confidence intervals
*------------------------------------------------------------------------------
 
parameter ols_se(k) 'standard errors';
ols_se(k) = beta.m(k);
display ols_se;
 
set ival 'confidence interval' /lo,up/;
scalar ndf 'degrees of freedom';
ndf = card(i) - card(k);
scalar alpha 'significance level' /0.025/;
scalar qt 'critical value';
$if defined tc        qt = sum((df,prob)$(dfval(df)=ndf and probval(prob)=0.025), tc(df,prob));
$if not defined tc    abort$(ndf<30) "Normal approximation not valid";
$if not defined tc    qt = sum(prob$(probval(prob)=0.025), qnorm(prob));
display ndf,alpha,qt;
 
parameter ols_conf_ival(k,ival);
ols_conf_ival(k,'lo') = beta.l(k) - qt*ols_se(k);
ols_conf_ival(k,'up') = beta.l(k) + qt*ols_se(k);
display ols_conf_ival;
 
* Note: these can also be imported from the GDX file LS.GDX
* See http://amsterdamoptimization.com/pdf/nlregression.pdf
 
*------------------------------------------------------------------------------
* GME model
*------------------------------------------------------------------------------
 
set j 'support points' /j1*j5/;
 
variables p(k,j), w(i,j);
p.lo(k,j) = 0.0001;
w.lo(i,j) = 0.0001;
 
variable
  entrpy      'entropy'
  e(i)        'residuals'
;
equations
  parmsupp(k) 'parameter support'
  errsupp(i)  'error support'
  normp(k)    'normalize p'
  normw(i)    'normalize w'
  obj         'maximize entropy'
  linear(i)   'linear model'
;
 
parameter
  z(k,j)      'parameter support matrix'
  v(i,j)      'error support matrix'
;
 
obj..          entrpy =e= -sum((k,j), p(k,j)*log(p(k,j)))-sum((i,j), w(i,j)*log(w(i,j)));
linear(i)..    data(i,'povrate') =e= sum(k, beta(k)*data(i,k)) + e(i);
parmsupp(k)..  beta(k) =e= sum(j, z(k,j)*p(k,j));
errsupp(i)..   e(i) =e= sum(j, v(i,j)*w(i,j));
normp(k)..     sum(j, p(k,j)) =e= 1;
normw(i)..     sum(j, w(i,j)) =e= 1;
 
model gme /obj,parmsupp,errsupp,linear,normp,normw/;
 
 
*------------------------------------------------------------------------------
* GME model support data
*------------------------------------------------------------------------------
set s 'error support scenarios' / s3, s4 /;
set gmex 'parameter support scenarios' /gme1,gme2,gme3/;
 
table z_all(gmex,k,j) 'parameter support for GME model'
                      j1   j2   j3   j4   j5
   gme1.constant     -50  -25    0   25   50
   gme1.famsize      -20  -10    0   10   20
   gme1.unemp        -20  -10    0   10   20
   gme1.highschl     -20  -10    0   10   20
   gme1.college      -20  -10    0   10   20
   gme1.medinc       -20  -10    0   10   20
   gme1.d90          -20  -10    0   10   20
 
   gme2.constant     -50  -25    0   25   50
   gme2.famsize      -10   -5    0    5   10
   gme2.unemp         -2   -1    0    1    2
   gme2.highschl      -2   -1    0    1    2
   gme2.college       -2   -1    0    1    2
   gme2.medinc       -10   -5    0    5   10
   gme2.d90          -20  -10    0   10   20
 
   gme3.constant     -50  -25    0   25   50
   gme3.famsize       -5    0    5   10   15
   gme3.unemp         -1    0    1    2    3
   gme3.highschl      -3   -2   -1    0    1
   gme3.college       -3   -2   -1    0    1
   gme3.medinc       -15  -10   -5    0    5
   gme3.d90          -20  -10    0   10   20
 
;
 
table esup_all(s,j) 'error support'
        j1    j2  j3    j4  j5
   s3  -10  -5.0   0   5.0  10
   s4  -13  -6.5   0   6.5  13
;
 
display "GME support data",z_all,esup_all;
 
*------------------------------------------------------------------------------
* run GME scenarios
*------------------------------------------------------------------------------
 
gme.limrow=0;
gme.limcol=0;
gme.solprint=2;
gme.solvelink=2;
loop((s,gmex),
  z(k,j) = z_all(gmex,k,j);
  v(i,j) = esup_all(s,j);
  solve gme maximizing entrpy using nlp;
  estim(k,s,gmex)=beta.l(k);
);
 
*------------------------------------------------------------------------------
* report results
*------------------------------------------------------------------------------
 
option estim:3:1:2;
display "---------------------------------------------------",
        "OLS and GME estimates (table 5 in [1])",
        "---------------------------------------------------",
         estim;
 
*------------------------------------------------------------------------------
* create bootstrap sample
*------------------------------------------------------------------------------
 
 
set nbs 'bootstrap samples' /sample1*sample400/;
 
parameter random(nbs,i) 'random draws with replacement';
random(nbs,i) = uniformint(1,card(i));
display random;
 
alias (i,ii);
set srandom(nbs,i,ii) 'as random but now expressed as a set';
srandom(nbs,i,ii)$(ord(ii)=random(nbs,i)) = yes;
display srandom;
 
 
*------------------------------------------------------------------------------
* run bootstrap method
*------------------------------------------------------------------------------
 
 
parameter newdata(i,k0);
 
equation linear2(i) 'linear model';
 
linear2(i)..    newdata(i,'povrate') =e= sum(k, beta(k)*newdata(i,k)) + e(i);
 
model gme2 /obj,parmsupp,errsupp,linear2,normp,normw/;
 
gme2.limrow=0;
gme2.limcol=0;
gme2.solprint=2;
gme2.solvelink=2;
 
parameter estim2(nbs,k,s,gmex) 'estimates';
 
loop(nbs,
    newdata(i,k0) = 0;
    newdata(i,k0) =  sum(srandom(nbs,i,ii), data(ii,k0));
 
    loop((s,gmex),
       z(k,j) = z_all(gmex,k,j);
       v(i,j) = esup_all(s,j);
       solve gme2 maximizing entrpy using nlp;
       estim2(nbs,k,s,gmex)=beta.l(k);
    );
);
 
display estim2;
 
*------------------------------------------------------------------------------
* calculate bootstrap confidence intervals
* percentile method
*------------------------------------------------------------------------------
 
 
* call rank first not from within a loop so declarations can happen
set idummy/a/;
parameters
   xdummy(idummy) /a 1/
   rdummy(idummy)
;
$libinclude rank xdummy idummy rdummy
 
 
parameter bconfival2(k,*,*,ival) 'confidence interval percentile method -- table 13 in [1]';
bconfival2(k,'OLS','-',ival) = ols_conf_ival(k,ival);
 
parameter
   bb(nbs)    'values to be sorted'
   pct(ival)  'percentiles'
   r(nbs)     'rank'
;
loop((k,s,gmex),
   bb(nbs) = estim2(nbs,k,s,gmex);
   pct('LO') = 2.5;
   pct('UP') = 97.5;
$libinclude rank bb nbs r pct
   bconfival2(k,s,gmex,ival) = pct(ival);
);
 
option bconfival2:3:1:3;
display bconfival2;

References:

Friday, July 4, 2008

ps2pdf

This online converter is handy if one needs to convert a postscript file to PDF:
http://www.ps2pdf.com/convert.htm.

Question about presolve

>Both AMPL and AIMMS have a built-in presolver that reduces
>the size of a model before it is passed on to the solver.
>Does GAMS have a presolver?


No. GAMS does not have a built-in presolver. It only has an option to remove fixed variables (model.holdfixed=1;). Solvers like GAMS/Cplex, GAMS/CONOPT have a built-in presolver inside the solver, but solvers like MINOS, SNOPT, SBB do not have this and they will need to work on the full size of the model. There are a number of advantages of having a presolver built into the modeling system: all solvers benefit even those without their own presolver, the modeling system can often provide better error messages and other feedback, and a smaller model to pass on to the solver may be cheaper (less I/O and other overhead).

Critical values for the Student's t distribution

Confidence intervals for OLS regression models require to look up values for the Student's t distribution. The following GAMS include file will implement such a table in GAMS readable format.



Example:

set i /case1*case116/
k0 /
constant 'constant term in regression'
povrate 'poverty level'
urb 'not used'
famsize 'average household size'
unemp 'percentage unemployment rate'
highschl 'percentage of population age 25 and over with high school degree only'
college 'percentage of population age 25 and over that completed 4 or more years of college'
medinc 'median household income'
D90 'dummy: one for the 1990 Census and zero for the 1980 Census'
/
k1(k0) 'data used' /povrate,famsize,unemp,highschl,college,medinc,D90/
;

*------------------------------------------------------------------------------
* input data from Ramanthan (2002, Data 7-6, p. 653)
*------------------------------------------------------------------------------

table data(i,k0) '-- input data from Ramanthan (2002, Data 7-6, p. 653)'
$include poverty.inc
;
display data;


*------------------------------------------------------------------------------
* critical values t distribution
*------------------------------------------------------------------------------

$include tc.inc

*------------------------------------------------------------------------------
* create some summary statistics
*------------------------------------------------------------------------------

parameter summary(k1,*) '-- summary statistics';
summary(k1,'mean') = sum(i,data(i,k1))/card(i);
summary(k1,'min') = smin(i,data(i,k1));
summary(k1,'max') = smax(i,data(i,k1));
summary(k1,'stdev') = sqrt[sum(i,sqr[data(i,k1)-summary(k1,'mean')])/(card(i)-1)];
summary(k1,'coeffvar') = summary(k1,'stdev')/summary(k1,'mean');
display summary;

*------------------------------------------------------------------------------
* linear regression using OLS
*------------------------------------------------------------------------------

set k(k0) 'regression coefficients'
/constant,famsize,unemp,highschl,college,medinc,D90/;
data(i,'constant') = 1;

variables
beta(k) 'regression coefficients'
sse 'sum of squared errors'
;
equations
dummyobj 'dummy objective'
fit(i) 'fit linear equations'
;

dummyobj.. sse =n= 0;
fit(i).. data(i,'povrate') =e= sum(k, beta(k)*data(i,k));

model ols /dummyobj,fit/;

option lp=ls;
solve ols using lp minimizing sse;

parameter ols_se(k) 'standard errors';
ols_se(k) = beta.m(k);

set ival 'confidence interval' /lo,up/;
scalar ndf 'degrees of freedom';
ndf = card(i) - card(k);
scalar alpha 'significance level' /0.025/;
scalar qt 'critical value';
qt = sum((df,prob)$(dfval(df)=ndf and probval(prob)=0.025), tc(df,prob));
display ndf,alpha,qt;

parameter ols_conf_ival(k,ival);
ols_conf_ival(k,'lo') = beta.l(k) - qt*ols_se(k);
ols_conf_ival(k,'up') = beta.l(k) + qt*ols_se(k);
display ols_se,ols_conf_ival;


Example output:

 =======================================================================
Least Square Solver
Erwin Kalvelagen, November 2004, October 2006
=======================================================================

Parameter Estimate Std. Error t value Pr(>|t|)
beta('constant') 0.21659E+02 0.55303E+01 0.39164E+01 0.15708E-03 ***
beta('famsize') 0.18042E+01 0.11617E+01 0.15531E+01 0.12329E+00
beta('unemp') 0.76467E-01 0.59044E-01 0.12951E+01 0.19803E+00
beta('highschl') -0.20135E+00 0.39135E-01 -0.51450E+01 0.11914E-05 ***
beta('college') 0.21313E-01 0.45803E-01 0.46532E+00 0.64263E+00
beta('medinc') -0.41557E+00 0.46677E-01 -0.89030E+01 0.13484E-13 ***
beta('D90') 0.85045E+01 0.10434E+01 0.81508E+01 0.66416E-12 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Estimation statistics:
Cases: 116 Parameters: 7 Residual sum of squares: 0.32146E+03
Residual standard error: 0.17173E+01 on 109 degrees of freedom
Multiple R-squared: 0.74583E+00 Adjusted R-squared: 0.73183E+00
F statistic: 0.53307E+02 on 6 and 109 DF, p-value: 0.00000E+00

DLL version: _GAMS_GDX_237_2007-01-09
GDX file: ls.gdx


---- 1276 PARAMETER ndf = 109.000 degrees of freedom
PARAMETER alpha = 0.025 significance level
PARAMETER qt = 1.982 critical value

---- 1281 PARAMETER ols_se standard errors

constant 5.530, famsize 1.162, unemp 0.059, highschl 0.039, college 0.046, medinc 0.047
D90 1.043


---- 1281 PARAMETER ols_conf_ival

lo up

constant 10.698 32.619
famsize -0.498 4.107
unemp -0.041 0.193
highschl -0.279 -0.124
college -0.069 0.112
medinc -0.508 -0.323
D90 6.436 10.572

Tuesday, July 1, 2008

Fortran question

I'm building and running LSSolver, but there are some problems interfacing the linking code in ls.f90 with the various Fortran sources (from TOMS and for the Student-t inverse, for example) on Linux.

Yes, for these older fortan codes you may need to add/real_size:64, -r8, /autodouble or something similar to the command line to set the default real size. I think all fortran compilers have a flag like that so you don't have to edit all declarations.

Here is a list for some compilers:

  • Visual Fortran: /real_size:64
  • Intel Fortran: -real-size 64 (linux) or /real-size:64 (windows)
  • Lahey/Fujitsu Fortran: -dbl
  • gfortran: -fdefault-real-8
  • g95: -r8
  • IBM XL Fortran: -qrealsize=8
  • SUN Fortran: -xtypemap=real:64

Converting MathProg (and AMPL) models to GAMS

The tool glpk2gams can convert models written in MathProg (subset of AMPL) so we can quickly assess how GAMS solvers perform on such a model.

If the model only has a .mod file, you can just use a single argument:

D:\glpk\glpk2gams\Debug>glpk2gams.exe diet.mod
Glpk Model filename: diet.mod
Gams filename: diet.gms

Reading model section from diet.mod...
Reading data section from diet.mod...
99 lines were read
Generating nb...
Generating cost...
Model has been successfully generated

Rows : 10 Columns : 20 Nonzero elements : 179
Writing diet.gms

D:\glpk\glpk2gams\Debug>



If the model also has a .dat file, you need to give all three command line arguments:

L:\glpk>glpk2gams egypt2.mod egypt2.gms egypt2.dat
Glpk Model filename: egypt2.mod
Gams filename: egypt2.gms
Data filename: egypt2.dat

Reading model section from egypt2.mod...
egypt2.mod:264: warning: unexpected end of file; missing end statement inserted
264 lines were read
Reading data section from egypt2.dat...
egypt2.dat:276: warning: unexpected end of file; missing end statement inserted
276 lines were read
Generating Psi...
Generating mbd...
Generating mbdb...
Generating mb...
Generating cc...
Generating ap...
Generating al...
Generating ai...
Model has been successfully generated

Rows : 285 Columns : 351 Nonzero elements : 1336
Writing egypt2.gms

L:\glpk>


It is generating scalar GAMS code, so this is not suited to convert models that have to be maintained in GAMS.