Friday, October 31, 2008

Gdxls: call only if needed

Here is a trick to call gdxls from GAMS only if the source file input.xls is newer than the destination file input.gdx:

$call test input.xls -nt input.gdx && gdxls -propertyfile properties.txt

This will cause that gdxls will only be called if input.xls has been changed since the last time this conversion was performed.

Thursday, October 30, 2008

Lp2gams update

Added semi-continuous variables so we can translate lp-solve models with such variables.

See: http://yetanothermathprogrammingconsultant.blogspot.com/2008/06/lpsolve-to-gams-conversion.html

Have not checked if fixed semi-continuous variables work identical at both sides. There is confusion about the meaning of this.

GAMS obscure error message

When we read using GAMS 22.7 a GAMS restart file produced by GAMS 22.8 we get a nasty error message. Looks like the versioning mechanism on workfiles does not completely work correctly here:

C:\Program Files\GAMS22.7>gams x r=x
--- Job x Start 10/30/08 16:56:41
*** Workfile synchronization problem in section BAS-END
*** Generated under GAMS Ver = WEX228-228
*** Processed under GAMS Ver = WEX227-227
*** Status: Terminated due to systems error in WF Section error: Expected = BAS-END, Found =
*** Inspect listing file for more information
***
*** GAMS Base Module May 1, 2008 22.7.2 WEX 4648.4799 WEI x86_64/MS Windows
***
*** GAMS Development Corporation
*** 1217 Potomac Street, NW
*** Washington, DC 20007, USA
*** 202-342-0180, 202-342-0181 fax
*** support@gams.com, www.gams.com
***
*** opentext W=0 FN="225a\gamsnext.cmd"
--- Job x.gms Stop 10/30/08 16:56:41 elapsed 0:00:00.020

***
*** Gams Exit Msg = WF Section error: Expected = BAS-END, Found =
***

C:\Program Files\GAMS22.7>


I was stumped when a client of mine reported this error. I expected an error message about version mismatch in such a case.

GAMS Function and Derivative Evaluation

I have two different formulations of a linearly constrained NLP (i.e. the objective is nonlinear). The difference is how we formulate the objective. In the next fragments x is a variable and c and e are parameters.

Original Version

cost1 ..        z  =e=  sum((i,j,k), x(i,j,k)*log(x(i,j,k)+e)) -
sum((i,j,k), x(i,j,k)*log(c(i,j)+e));

Total CPU secs in IPOPT (w/o function evaluations) = 6.033
Total CPU secs in NLP function evaluations = 33.971


Fast Version


cost2 .. z =e= sum((i,j,k), x(i,j,k)*log((x(i,j,k)+e)/(c(i,j)+e)));

Total CPU secs in IPOPT (w/o function evaluations) = 5.905
Total CPU secs in NLP function evaluations = 0.541


There is probably something wrong how GAMS deals with the derivatives for the first version. I passed this on to GAMS so they can investigate.


The real model is much larger (> half a million variables), and then this difference becomes really significant. One version can solve the model quickly to near optimality (with Mosek) and the other version is not able to finish in reasonable time.

Langford's problem

The problem is to find integer sequences with special characteristics. See http://www.lclark.edu/~miller/langford.html. An example of a sequence is





I.e. we have 4 sets of 2 numbers 1,2,3,4 where between number k we have k other numbers.


With Microsoft Solver Foundation it is easy to create a constraint programming version that solves much quicker and in addition it is easy to add a simple user interface.




Note that there are improved CP formulations. See e.g. http://4c.ucc.ie/~bms/Papers/ResearchReports/2000_18.ps

Monday, October 27, 2008

gdxls: write xls files on Linux

The gdxls tool is a Java based alternative for GDXXRW on Linux and other non-Windows operating systems.


Besides converting xls→gdx (as described in http://yetanothermathprogrammingconsultant.blogspot.com/2008/10/gdxls-read-xls-files-on-linux.html) we can also convert gdx→xls.

As simple example model is shown below:
$onecho > trnsport2.properties
o=trnsport2.xls
i=trnsport2.gdx

parameter=c
rng=a1
rdim=1
cdim=1

parameter.2=c
rng.2=f1
rdim.2=2
cdim.2=0

parameter.3=c
rng.3=sheet1!A10
rdim.3=0
cdim.3=2

$offecho

$call gamslib trnsport
$call gams trnsport lo=2 gdx=trnsport2

$call ./gdxls -propertyfile trnsport2.properties

$call oocalc trnsport2.xls


We run the trnsport model from the Model Library and save the results in a gdx file. We then extract the c(i,j) parameter from the gdx file and write it to the spreadsheet in different formats.

Note that we use a numbered suffix in the properties file to indicate which options belong to the same item. No suffix is identical to suffix ".1". So we have three items to process and they are numbered 1,2,3.

The result rendered by Open Office Calc looks like the picture shown below (click to enlarge).

Saturday, October 25, 2008

gdxls: read xls files on Linux

The GAMS tool GDXXRW is not available under Linux, Unix and OS X.  As I needed to have this functionality in some projects, I developed a Java based alternative. Technobabble: The tool is using GCJ,GCC and CNI so it should be easy to port to any platform where the GNU compilers GCC and GCJ are available.

The tool gdxls can read xls files on Linux (and other Unix-based operating systems) to generate gdx files. Similar to GDXXRW it works as described below.

Consider the following spreadsheet (it can be an xls file from Excel or as shown from Open Office):



The GAMS model to import this file via a gdx file is:

$onecho > trnsport.properties
i=trnsport.xls
o=trnsport.gdx

parameter=c
rng=b4
rdim=1
cdim=1

$offecho

$call gdxls -propertyfile trnsport.properties

sets
i 'canning plants' / seattle, san-diego /
j 'markets' / new-york, chicago, topeka /
;

parameter c(i,j);
$gdxin trnsport.gdx
$load c

display c;


The log will look like:

erwin@erwin-desktop:~/workspace/gdxls$ gams test1.gms
--- Job test1.gms Start 10/25/08 09:42:14
GAMS Rev 228 Copyright (C) 1987-2008 GAMS Development. All rights reserved
Licensee: GAMS Development Corporation, Washington, DC G871201/0000CA-ANY
Free Demo, 202-342-0180, sales@gams.com, www.gams.com DC0000
--- Starting compilation
--- test1.gms(12) 2 Mb
--- call gdxls -propertyfile trnsport.properties
GDXLS V 0.1, Amsterdam Optimization (c) 2008
xls2gdx,input=trnsport.xls,output=trnsport.gdx
Parameter;name=c;range=B4:E6;rdim=1;cdim=1
xls2gdx done
--- test1.gms(20) 3 Mb
--- GDXin=/home/erwin/workspace/gdxls/trnsport.gdx
--- test1.gms(23) 3 Mb
--- Starting execution: elapsed 0:00:00.186
--- test1.gms(23) 4 Mb
*** Status: Normal completion
--- Job test1.gms Stop 10/25/08 09:42:14 elapsed 0:00:00.188
erwin@erwin-desktop:~/workspace/gdxls$




The result is:

----     23 PARAMETER c  gdxls range:B4:E6

new-york chicago topeka

seattle 2.500 1.700 1.800
san-diego 2.500 1.800 1.400


Another posting will describe gdx to xls conversion.

Thursday, October 23, 2008

Magic squares in GAMS

In http://yetanothermathprogrammingconsultant.blogspot.com/2008/10/magic-squares-with-ms-csp-solver.html we describe a CSP formulation using the Microsoft Solver Foundation tools. Here are some GAMS formulations using a MIP. 

Instead of x(i,j) we use a single index: x(i). That makes it easier to model the "all-different" constraints, at the expense of making the row, column and diagonal sums more difficult. The latter complication can be alleviated by using a mapping set that maps between the two representations.


This version uses a permutation matrix paradigm to model the all-different constraint:

$ontext
Magic Squares

Take consecutive numbers from 1 to n^2 and
arrange them in a n x n array such that
every row, every column and the two diagonals
all add up to the same total.

See also: http://forum.swarthmore.edu/alejandre/magic.square.html

Erwin Kalvelagen, nov 1999
$offtext

set k "board size" /k1*k5/;
alias (k,kk);
set i "numbering of cells" /i1*i25/;
alias(i,j);

abort$(sqr(card(k)) <> card(i)) "card(i) should be card(k)^k";

$ontext
cells are numbered:

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16

$offtext



variable x(i) "value of board cell i";
binary variable y(i,j) "permutation matrix";
variable s "sum";


parameter p(i);
p(i)=ord(i);


equations
unique(i) "make sure each x(i) is unique"
yrow(i) "row sum of y"
ycol(j) "col sum of y"
row(k) "row of board"
col(k) "col of board"
diag1 "first diagonal"
diag2 "second diagonal"
;



set map(k,kk,i);
scalar cnt /0/;
loop((k,kk),
cnt = cnt + 1;
map(k,kk,i)$(ord(i)=cnt)=yes;
);
display map;

set revdiag(i);
loop(map(k,kk,i)$((ord(k)+ord(kk))=card(k)+1),
revdiag(i) = yes;
);

display revdiag;

unique(i)..x(i)=e=sum(j,y(i,j)*p(j));
yrow(i).. sum(j,y(i,j))=e=1;
ycol(j).. sum(i,y(i,j))=e=1;

row(k).. sum(map(k,kk,i), x(i)) =e= s;
col(k).. sum(map(kk,k,i), x(i)) =e= s;
diag1.. sum(map(k,k,i), x(i)) =e= s;
diag2.. sum(revdiag, x(revdiag)) =e= s;


equations
extra1 "extra constraint for performance"
extra2 "another redundant constraint for performance"
;

* additional constraints below
scalar n "size of set i";
n = card(i);
extra1.. sum(i,x(i)) =e= n*(n+1)/2;
extra2.. sum((i,j),y(i,j)) =e= n;

* fix s with its known value
scalar magicsum;
magicsum = card(k)*(n+1)/2;
s.fx = magicsum;


model m1 /all/;
solve m1 using mip minimizing s;

display x.l;

parameter board(k,kk);
board(k,kk) = sum(map(k,kk,i), x.l(i));
display board;


This version is using or-constraints to model the all-different condition.

$ontext
Magic Squares

Take consecutive numbers from 1 to n^2 and
arrange them in a n x n array such that
every row, every column and the two diagonals
all add up to the same total.

See also: http://forum.swarthmore.edu/alejandre/magic.square.html

Erwin Kalvelagen, nov 1999
$offtext

set k "board size" /k1*k5/;
alias (k,kk);
set i "numbering of cells" /i1*i25/;
alias(i,j);

abort$(sqr(card(k)) <> card(i)) "card(i) should be card(k)^k";

$ontext
cells are numbered:

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16

$offtext



variable x(i) "value of board cell i";
binary variable y(i,j) "permutation matrix";
variable s "sum";


parameter p(i);
p(i)=ord(i);


equations
unique(i) "make sure each x(i) is unique"
yrow(i) "row sum of y"
ycol(j) "col sum of y"
row(k) "row of board"
col(k) "col of board"
diag1 "first diagonal"
diag2 "second diagonal"
;



set map(k,kk,i);
scalar cnt /0/;
loop((k,kk),
cnt = cnt + 1;
map(k,kk,i)$(ord(i)=cnt)=yes;
);
display map;

set revdiag(i);
loop(map(k,kk,i)$((ord(k)+ord(kk))=card(k)+1),
revdiag(i) = yes;
);

display revdiag;

unique(i)..x(i)=e=sum(j,y(i,j)*p(j));
yrow(i).. sum(j,y(i,j))=e=1;
ycol(j).. sum(i,y(i,j))=e=1;

row(k).. sum(map(k,kk,i), x(i)) =e= s;
col(k).. sum(map(kk,k,i), x(i)) =e= s;
diag1.. sum(map(k,k,i), x(i)) =e= s;
diag2.. sum(revdiag, x(revdiag)) =e= s;


equations
extra1 "extra constraint for performance"
extra2 "another redundant constraint for performance"
;

* additional constraints below
scalar n "size of set i";
n = card(i);
extra1.. sum(i,x(i)) =e= n*(n+1)/2;
extra2.. sum((i,j),y(i,j)) =e= n;

* fix s with its known value
scalar magicsum;
magicsum = card(k)*(n+1)/2;
s.fx = magicsum;


model m1 /all/;
solve m1 using mip minimizing s;

display x.l;

parameter board(k,kk);
board(k,kk) = sum(map(k,kk,i), x.l(i));
display board;

The mapping set looks like:

----     60 SET map  

i1 i2 i3 i4 i5 i6 i7 i8 i9

k1.k1 YES
k1.k2 YES
k1.k3 YES
k1.k4 YES
k1.k5 YES
k2.k1 YES
k2.k2 YES
k2.k3 YES
k2.k4 YES

+ i10 i11 i12 i13 i14 i15 i16 i17 i18

k2.k5 YES
k3.k1 YES
k3.k2 YES
k3.k3 YES
k3.k4 YES
k3.k5 YES
k4.k1 YES
k4.k2 YES
k4.k3 YES

+ i19 i20 i21 i22 i23 i24 i25

k4.k4 YES
k4.k5 YES
k5.k1 YES
k5.k2 YES
k5.k3 YES
k5.k4 YES
k5.k5 YES



The solution is displayed as:


----     99 VARIABLE x.L  value of board cell i

i1 6.000, i2 22.000, i3 7.000, i4 13.000, i5 17.000, i6 20.000, i7 24.000, i8 2.000
i9 18.000, i10 1.000, i11 19.000, i12 12.000, i13 16.000, i14 8.000, i15 10.000, i16 9.000
i17 3.000, i18 25.000, i19 5.000, i20 23.000, i21 11.000, i22 4.000, i23 15.000, i24 21.000
i25 14.000


---- 103 PARAMETER board

k1 k2 k3 k4 k5

k1 6.000 22.000 7.000 13.000 17.000
k2 20.000 24.000 2.000 18.000 1.000
k3 19.000 12.000 16.000 8.000 10.000
k4 9.000 3.000 25.000 5.000 23.000
k5 11.000 4.000 15.000 21.000 14.000


The N=5 problem is already very difficult to solve.

Reading Excel XLS files on Linux

Using the code from http://poi.apache.org/ I can now read and process Excel XLS files on Linux (and Mac). 

Note: this code implements the Compound Document Format, so does not depend on Windows.

Sunday, October 19, 2008

Magic Squares with MS CSP solver

This model generates Magic Squares (http://en.wikipedia.org/wiki/Magic_square) using the Microsoft Solver Foundation Excel tool.

As we have no objective, this can be formulated as a constraint programming problem. This makes it easy to use the alldifferent constraint (using the unequal construct in OML).




This remains a difficult problem: only quick solutions up to N=5.



Update: I tightened the symmetry breaking constraints and used a different CP heuristic (Domain over Weighted Degree). Choosing the correct heuristic was not done by deep insight but rather trial-and-error. This allowed me to solve even up to N=10 quickly.




It will be difficult to get good performance from a MIP solver on this. The MIP formulations for the all-different constraint are very demanding for a MIP solver. See e.g. H.P.Williams, Hong Yan, "Representations of the all-different Predicate of Constraint Satisfaction in Integer Programming," INFORMS Journal on Computing, Vol. 13 (2001) 96-103,
http://www.lse.ac.uk/collections/operationalResearch/pdf/PWilliams_2001_IJOC.pdf.

Monday, October 13, 2008

More solvers on the way

After Microsoft entering the Math Programming Software market, there also will be more solver competition at the high-end: http://www.gurobi.com/index.html. These guys were main developers of Cplex, so I would guess they aim towards a similarly positioned product. This is of course a very welcome development for us modelers.

Convex Linearly Constrained Model

I was given a very large convex linearly constrained model (exceeding half a million variables). The "standard" NLP solvers (Conopt,Minos,Snopt) were very slow on this model, not even being close to optimality after hours of working. Mosek solved the model very fast, but not to optimality: Solution status : NEAR_OPTIMAL.

Unfortunately this near optimal point could not be improved quickly by other solvers. In the end we solved a few QP approximations of the form:

min g(x)=f(x0) + ∇ f(x0) (x-x0) + 0.5 (x-x0)T2 f(x0) (x-x0)
s.t. Ax=b

Here x0 is the current value of x.

Basically a Taylor series approximation of the objective. This actually converged rapidly to an optimal solution we could verify with Conopt. (Conopt complained about some reduced gradients being too large, but could not improve the solution further). What this algorithm lacks in refinement it gains in simplicity: just a loop around a solve statement.

Implementing such a problem specific algorithm is easy to do in GAMS using its procedural features (the complete code in this case is less than 20 lines), and for the most difficult problems this often can help. We further benefitted from having access to multiple NLP solvers (in this case Mosek and Conopt).

Saturday, October 11, 2008

Cannot start GAMS IDE

> I cannot start the GAMS IDE when I am not
> connected to the network. I get a message
> like:
> Problem with system file
> \ecomfp01\user$\My Documents\gamsdir\gamside.ini
> Error 1231: The network location cannot be reached.


The GAMS IDE uses the "My Documents" directory to store an initialization file. It will refuse to run if this directory can not be accessed. Usually this directory is on the local c: drive, but some system administrators may choose to have this directory on a network drive. The result is that the GAMS IDE will not start if your laptop is not connected to the network, even if GAMS itself is installed on the local c: drive.

The work-around is to create a shortcut on the desktop with target:

"C:\Program Files\GAMS22.8\gamside.exe" c:\projects\gams\gamside.ini

When you click on this shortcut the GAMS IDE will use the c:\projects\gams directory to store the initialization file. Make sure this directory exist!

The IDE Help has some notes on this (not being able to start the IDE, you would need to click on gamside.chm in the GAMS system directory).

Friday, October 10, 2008

Drunken noodle log

The GAMS/Ipopt log always somehow reminds me of drunken noodles because of the misaligned headers.

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
30 4.2974620e+002 3.78e+001 3.39e+001 -1.9 5.15e+002 - 8.47e-002 1.96e-001f 1
31 4.2874611e+002 2.87e+001 2.01e+002 -1.9 5.03e+002 - 1.02e-001 2.40e-001f 1
32 4.2792912e+002 2.28e+001 1.65e+002 -2.0 4.90e+002 - 2.02e-001 2.07e-001f 1


I would guess the formats are a little bit too tight for some platforms.

Thursday, October 9, 2008

GAMS/Mosek link bug

GAMS/Mosek stops but GAMS keeps it a secret why this happened. A few extra lines with some text explaining why the solve was aborted would not be unwelcome:


               S O L V E      S U M M A R Y

MODEL LANDentSN2 OBJECTIVE ENTROPY
TYPE NLP DIRECTION MINIMIZE
SOLVER MOSEK FROM LINE 249734

**** SOLVER STATUS 4 TERMINATED BY SOLVER
**** MODEL STATUS 7 INTERMEDIATE NONOPTIMAL
**** OBJECTIVE VALUE 217.6762

RESOURCE USAGE, LIMIT 539.256 900000.000
ITERATION COUNT, LIMIT 0 900000
EVALUATION ERRORS 0 0

MOSEK Link Aug 1, 2008 22.8.1 WEX 5438.6015 WEI x86_64/MS Windows

M O S E K version 5.0.0.90 (Build date: Jun 6 2008 14:57:22)
Copyright (C) MOSEK ApS, Fruebjergvej 3, Box 16
DK-2100 Copenhagen, Denmark
http://www.mosek.com


The log file may contain additional information. Unfortunately if you run GAMS from the command line (e.g. by Linux users) the log is not saved. (The IDE under Windows will save a copy).

Tuesday, October 7, 2008

MLE estimation in GAMS

This Max Likelihood Estimation example includes forming of confidence intervals. Standard errors are found by inverting the Hessian and normal quantiles are calculated using a minimal CNS model.

http://amsterdamoptimization.com/models/statistics/weibull.gms

This kind of analysis is only possible with new features in GAMS. However, it remains kind of klunky compared to Matlab, Gauss or SAS. The GAMS approach may be useful nevertheless, e.g. when using as part of a larger GAMS model (in which case calling a different system may be cumbersome), when the problem is very difficult (in which case stronger NLP solvers and automatic differentiation can help) or when you don't have access to these other systems.

Sunday, October 5, 2008

Thursday, October 2, 2008

Bug in GAMS/Xpress link

The GAMS/Xpress link can terminate without reason:

**** SOLVER STATUS 4 TERMINATED BY SOLVER
**** MODEL STATUS 8 INTEGER SOLUTION
**** OBJECTIVE VALUE 134400.0000

RESOURCE USAGE, LIMIT 30.379 1000.000
ITERATION COUNT, LIMIT 5515 10000

The listing files does not convey any messages indicating the reason for stopping. We have observed this before, but this bug has not been fixed as of now. If a solver stops it is important to say why. Just simple-mindedly reporting "terminated by solver" makes it very difficult to fix whatever the reason is for the termination of the job.

Note: the log file did not help me either with information why the solve was stopped.

trnsport in Microsoft Solver Foundation

These screendumps show how the trnsport.gms model from the GAMS model library can be expressed in OML. Note that all data manipulation is done in Excel.





OML does not allow for any data manipulation. That means that data preparation (model calibration, balancing, aggregation, set manipulation etc) needs to be done outside the modeling language. The same for data manipulation during report writing or when developing heuristic algorithms instead of a single solve.
Most of my GAMS models include a large part doing data manipulation. Often more than 80%. It would not be as convenient to be forced to move all data manipulation out of the modeling language and into a database, or spreadsheet.
I have not figured out yet if there is a mechanism to handle exceptions in OML like the $ such that operator in GAMS.

Wednesday, October 1, 2008

Tool to compare text files

> I am looking for a good visual tool to compare gams source files.

I highly recommend Beyond Compare from http://www.scootersoftware.com/

Handling data sources in GAMS

> I have data sitting in different databases and spreadsheets. How to I
> handle this most conveniently in GAMS.


I usually handle this is a separate GAMS model that imports from all these data sources using tools like MDB2GMS, SQL2GMS and store the data in a GDX file. Then your main GAMS model can read this GDX file. This has several advantages: your main model can focus on the optimization model itself, and will not be cluttered by all data handling and conversion code. In addition the intermediate GDX file is convenient as it now is a single self-contained file. This makes it easy to share with others, to use in other related models, to be inspected in the IDE etc. An additional advantage is that the GDX file functions as a snapshot. Even if later on databases are changing, you have a (hopefully consistent) snapshot of all input data, that allows you to run the model against a stable data set.


Attached are sheets illustrating this concept for some very large models.