Yet Another Math Programming Consultant

Friday, October 22, 2010

Speed of loops in GAMS

In general loops are slow in GAMS. See http://yetanothermathprogrammingconsultant.blogspot.com/2009/08/gams-speed-of-loops.html. There are some exceptions however. Here is an example:

Note that map(i,k) has a diagonal structure.

The timings are:

method 1: loop	0.156 seconds
method 2: sum	3.495 seconds

Wednesday, October 20, 2010

Advanced basis/starting point in GAMS

Savepoint/loadpoint is an easy way to use an initial point for an NLP or advanced basis for an LP. The commands are:

The first time you run this the solution is saved in the GDX file. The next time around, this GDX is used to set as many levels and marginals that can be matched by name. For this very large MCP model it turned out to be useful to bring down solution times of related experiments.

Sorry for the ugly syntax. I would have preferred something more symmetric for reading and writing.

Tuesday, October 12, 2010

Assertive student

Hi Erwin,

In short, I'm student and wish not to receive "ask your supervisor" answer.

Is there a function, like the error function, to compute the probability of random variable that has a beta distribution? if not, is there any way to statistically represent a bounded variable?

Thanks for you consideration,

This is probably related to http://yetanothermathprogrammingconsultant.blogspot.com/2010/09/student-mails.html. My hint in this case was: See http://www.amsterdamoptimization.com/pdf/specfun.pdf.

Monday, October 11, 2010

Dissemination of model results

After spending time and effort on a model, it may well be worth to pay attention to presentation and dissemination of the model results. Currently I am working on a project where we can export results for a very large and complex GAMS model to an Adobe Flash application called StatPlanet. Obviously a web-based tool to inspect results has lots of advantages (nothing to install, available from everywhere).

GDX2StatPlanet

Thursday, October 7, 2010

GAMS/GDX

GAMS has a binary data file format for which an API is available. In the last few weeks I encountered a number of similar issues with this in different projects.
The interface does require more knowledge of the internals of GAMS than most people have. An example is here. When we want to read a GAMS parameter A(I,J) into a C/Fortran/Delphi matrix A[1..m,1..n] we see this is not completely trivial. The basic obstacle is that GAMS has strings as indices, while most programming languages have simple integers as indices. I.e. we need to map J = /new-york, chicago, topeka/ into j=1..3. For large problems that means some kind of hashing or other way to quickly look up strings. There is also a “raw” mode interface that does not give strings but rather internal GAMS numbers. So we need to map J = /1001,1002,1005/ to j=1..3. This again is not a completely trivial task. If you are familiar with the internals of GAMS one can know that GAMS internal numbers (so-called UEL numbers) are by definition increasing. So one could use a binary search to quickly find numbers. A fortran routine for this can look like:

Another possibility would by to have an array of the size of MAXUEL that has mapping information from UEL-> j=1..3. That requires more space but has quick look up. If the GDX file contains only one set, one could use socalled “mapped” mode, but that cannot be used in general.

It is also worth noting that GAMS only stores and exports nonzero elements. So we cannot assume all A(i,j)’s are in the GDX file.

So to recap, my suggested approach to read a matrix A(i,j) from a GDX file is:

Read 1D set I (in raw mode) and store UEL’s in an integer array IA
Read 1D set J (in raw mode) and store UEL’s in an integer array JA
Allocate array A(i,j) and set all elements to zero
Read 2D parameter A (in raw mode). For each record do:
1. Retrieve (ueli, uelj, aij)
2. i := bsearch(IA, length(IA), ueli)
3. j := bsearch(JA, length(JA), uelj)
4. A[i,j] := aij

If you don’t want to use binary search an alternative algorithm would be:

Retrieve MAXUEL (gdxSystemInfo)
Allocate integer arrays IA and JA of length MAXUEL
Read 1D set I (in raw mode). For each record do:
1. Retrieve uel
2. IA[uel] := recordnumber
Read 1D set J into array JA as described in step 3
Read 2D parameter A (in raw mode). For each record do:
1. Retrieve (ueli, uelj, aij)
2. i := IA[ueli]
3. j := JA[uelj]
4. A[i,j] := aij

Notes:

The generated interface files are quite ugly, have strange design and implementation and have no comments on usage. The code in the provided files just looks very awkward. Below are a few more specific issues illustrating this.
The filenames are incomprehensible: gdxdocpdef.pas, gdxdcpdef.pas ???
The functions should return boolean/logical values instead of integers.
The previous point is the more crucial as some routines have 0=success and others have a return code <>0 meaning success. That throws me off balance quite often.
Fortran support for many compilers is missing.
The examples are too simplistic. The above is not explained at all.
The generated interface contains some code that I would refuse from my students when I was teaching programming at undergraduate level. An example is:
n: ShortString
A symbol n should always be an integer, never a string (or a floating point number).
The Delphi interface does not compile under the latest compiler.

Delphi 2010

GAMS users routinely generate very large GDX files with millions of records. To get data in and out of these files efficiently still requires substantial skills and effort.

Sunday, October 3, 2010

Large NLP/MCP models

For a very large large MCP model (a system of nonlinear equations with complementarity conditions) we observed it became unsolvable, even with a very good starting point. The reason was that the model was kept as small as possible in terms of equations and variables. As a result the model was much more dense and nonlinear (in terms of nonlinear nonzero elements) than needed. Basically, modeling systems and solvers that exploit sparsity should be fed large, sparse models instead of corresponding smaller, denser models. After substituting out some common expressions into separate equations (hence increasing the size in terms of equations and variables) we could solve the larger models (≈400K variables and equations) fairly easily.

An linear example of such a problem is shown here: http://yetanothermathprogrammingconsultant.blogspot.com/2010/04/l1-portfolio-formulation.html.

An example of a nonlinear construct that should raise a red flag is something like:

In many cases it is better to write this as:

This will keep the number of nonlinear nonzero elements smaller at the expense of extra variables and equations.

In the model I was looking at a combination of these two situations was implemented: a large linear expression inside a nonlinear expression, and this nonlinear expression was repeated several times in the model. This caused both extra nonzero elements, and they appeared also in very nonlinear fashion.

For a smaller version of the model, the differences in statistics is striking (I was unable to get the large version to solve with the original formulation):

old model

MODEL STATISTICS

BLOCKS OF EQUATIONS          25     SINGLE EQUATIONS      106,219
BLOCKS OF VARIABLES          25     SINGLE VARIABLES      106,236
NON ZERO ELEMENTS     5,238,688     NON LINEAR N-Z      5,034,974
DERIVATIVE POOL           1,828     CONSTANT POOL         104,745
CODE LENGTH         120,759,139

GENERATION TIME      =       52.151 SECONDS

EXECUTION TIME       =       52.463 SECONDS

RESOURCE USAGE, LIMIT        281.806      1000.000

new formulation

MODEL STATISTICS

BLOCKS OF EQUATIONS          26     SINGLE EQUATIONS      107,605
BLOCKS OF VARIABLES          26     SINGLE VARIABLES      107,622
NON ZERO ELEMENTS       768,295     NON LINEAR N-Z        561,809
DERIVATIVE POOL           1,859     CONSTANT POOL         104,745
CODE LENGTH          14,348,068

GENERATION TIME = 7.566 SECONDS

EXECUTION TIME = 7.972 SECONDS

RESOURCE USAGE, LIMIT 57.216 1000.000

Unfortunately solvers and modeling systems don’t give much feedback on these issues. Such a diagnoses requires some modeling knowledge and experience (one reason why I am sometimes hired to help out).

Monday, September 27, 2010

Student mails

I get many mails from students asking for help. I usually try to give a small hint and direct them to get more help from their supervisor. Here are some examples.

I work for a supply chain management company, and we need some help on the attached exercise we have been given as an internal training exercise. Can you help? Attached is the project and all of the data and mod files. The deadline is Friday. If you can help, please provide a price quotation.

Answer: I would suggest to takes this up with your instructor. I believe it can be conceived as inappropriate if I would step into this.

After my standard reply of "Talk to your supervisor. He/she is actually getting paid to help you." one sometimes get long stories of poor supervision by professors:

I am really, really sorry to bother you, but my supervisor does not know how to use CPlex, only one lecturer in Math department knows how to use CPlex, but he is very busy at the moment (his wife just got cancer), you are the only expert that I can count on, it is a critical part of my PhD thesis, I really, really need your help for my problem, would you please have a look at my question and give me some idea? Thank you billion times! Your help will be greatly appreciated!

Or sometimes shorter:

Me and my wife are in very difficult situation. She runs out of the scholarship and I have to go back to Kuwait with her and the family.

Of course if I send back a file with some results, I really should hide all details about who actually did the work. Secrecy is important here:

Also can you change the following to STUDENTSNAME

Licensee: Erwin Kalvelagen G080731/0001CJ-WIN

Nobody know you are helping me please. My future is between your hands

Can you run GAMS economical model / Spread sheet please and put my name STUDENTSNAME everywhere please?

Some people are quite brazen in their request. This guy wants me to do his complete PhD work and he will find someone else to help me if I am too expensive:

What is your lump thump price tell I get my PhD to get two publication accepted and modify my GAMS RESULTS tell the committee is satisfied and teaching me what you did and prepare me also for defence?

What is the budget for the first publication is $? while how much you should be able to write the second publication for $?. you will make sure that those publications are of such a high standard that you will be able to find a journal which will accept both papers.

In case that this is more than my budget allows, then we can just do the GAMS part for me. Once this is done we might as well find someone else who can write you the two publications.

Does not look PhD supervisors are doing a good job if students think they can get away with such requests!

GAMS macro problems

I am working on a model that is using the preprocessor tool described in http://www.mpsge.org/inclib/gams-f.htm. There were a few minor issues related to this tool:

It does not really work very well with the GAMS IDE. You cannot click on the red error messages to bring you to the input line with the error.
It does not handle combined use in equations and parameter assignments in post-solve reporting very well.

The documentation of the new $macro facility in GAMS (http://www.gams.com/docs/release/release.htm, section “The GAMS Macro Facility” ) seems to indicate that it can be used to replace the preprocessor tool: just use $macro and use set.local where needed. Unfortunately that is not completely true. Especially the sentence “The .local feature has been added to ensure local use and eliminates the need for new alias definitions.” is plain wrong.

Here is a simple model to illustrate the problem:

$sysinclude gams-f

set i /i1*i3/;
alias (i,j);

parameter x(i,j);
x(i,j) = normal(0,1);

*-----------------------------------------------------

parameter p(i) 'without macro/preproc';
p(i) = sum(j, x(i,j));
display p;

*-----------------------------------------------------

y(i) == sum(j, x(i,j));

parameter p1(j) 'with preproc';
p1(j) = y(j);
display p1;

*-----------------------------------------------------

$macro y2(i) [sum(j.local, x(i,j))]

parameter p2(j) 'with macro';
p2(j) = y2(j);
display p2;

This shows:

---- 26 PARAMETER p without macro/preproc

i1 0.478, i2 -3.533, i3 -0.219

---- 49 PARAMETER p1 with preproc

i1 0.478, i2 -3.533, i3 -0.219

---- 57 PARAMETER p2 with macro

i1 -1.804, i2 -1.804, i3 -1.804

The cause can be found in the listing file:

35 ALIAS(FCN000,j);
36
37
38
39 parameter p1(j) 'with preproc';
40 p1(j) =
41
42 *       #### Preprocessor function replacement for Y ####
43
44 (
45   sum(FCN000, x(j,FCN000))
46 )
…….
55 parameter p2(j) 'with macro';
56 p2(j) = [sum(j.local, x(j,j))];

Clearly the GAMS-F preprocessor handles this case correctly by introducing an alias. The $macro/.local combination does not do this and gives the wrong results. Instead of this .local hack it is needed to manually introduce an alias and use the aliased set where needed.

It is noted that overuse of these tools can lead to models that appear smaller (fewer equations and variables). However those models may be much denser and much more nonlinear (say in terms of nonlinear nonzero elements). Often larger, sparser and less non-linear models solve easier.

Monday, September 20, 2010

Google or-tools

The first OR-tool from Google is a preliminary constraint programming solver, available from here: https://code.google.com/p/or-tools/. The name “or-tools” seems to indicate we can expect more to come.

Wednesday, September 8, 2010

Efficient frontier-like analysis

I needed to perform an optimization run of the form:

for different λ’s. Essentially something like an efficient frontier but in this case we have a MINLP (with semi-integer variables), so no smooth curve but instead some points.

In this case f(x) and g(x) were having different units and different magnitudes. In that case it is better to solve first:

and then use:

This will get a (hopefully) better spread of the λ’s. See: http://yetanothermathprogrammingconsultant.blogspot.com/2010/02/efficient-frontiers-in-mip-models.html.

Of course if λ=0 or λ=1 is part of the λ’s we need to trace, then this actually does not require extra models to solve, just a different ordering: first do λ=0 and λ=1 to get f₀ and g₀ and then do the λ’s in between. As the λ’s are user specified, we needed a little bit of logic to handle this.