Sunday, October 31, 2010

U shaped penalties

For a large LP model a customer of mine wanted to have instead of a V shaped penalty around a goal, something that approximates a U shaped penalty. With standard slacks:

 

image

we essentially have a V shaped penalty:

image

We can approximate a more U shaped form by introducing extra slacks with a smaller penalty:

image

with unit penalty q smaller than p, e.q. q=0.1*p. This will give:

image

The idea is that the algorithm will select the cheap slacks t first, before spending on the expensive slacks s. This will keep the LP linear and we don’t need extra binary variables. For q=0 we essentially allow small deviations for free, but start paying when we go beyond the threshold T.

This approach can easily be extended to more piecewise linear segments (as long as we keep things convex). Also the range T for the slacks t may be generalized to:

image

Saturday, October 30, 2010

StatPlanet

When demoing http://www.sacmeq.org/statplanet/StatPlanet.html (for which I developed a GAMS link), I reminded to Hans Rosling. Check here for some fantastic presentations: http://www.gapminder.org/videos/. The software has been acquired by Google. Here is an example:

Dependencies Young vs. Old

Wednesday, October 27, 2010

Amazon ec2

Amazon “Elastic Compute Cloud” has some big-ass machines available: http://aws.amazon.com/ec2/#instance. See also the rates on that page.

Tuesday, October 26, 2010

Running large models using a web interface

For a project I am considering a fairly large GAMS models where we want end-users to run certain scenarios. Of course there must be HTML GUI to allow the user to specify the input for each scenario to run. That is fairly standard stuff – once you know what type of parameters you want to expose to the user to change. The bigger issue is that the model is too large to run immediately if the user presses the Submit button. Rather we need to create a small queueing mechanism where jobs are waiting to run. After the model has been solved the user can inspect the results.
Running background jobs from a web-server environment is a little bit more exotic, although other examples exist (see e.g. NEOS and this effort). For this project we use a Windows IIS web server with PHP.

Architecture


A background solve process can look like:
  1. If background processes are already running, stop
  2. Start a background process as follows:
  3. While there are waiting jobs
    1. select a job where status=waiting
    2. set status=running
    3. solve model
    4. store results
    5. set status=solved
We run this thing after every INSERT in the job table.
Actually we can have several of those processes if the server is beefy enough to allow for parallel solution of different models. This does not only depends on the number of cores available but also memory considerations play a role. Essentially line 1 becomes:
  1. If number of background processes ≥ maxnumproc, stop

Friday, October 22, 2010

Speed of loops in GAMS

In general loops are slow in GAMS. See http://yetanothermathprogrammingconsultant.blogspot.com/2009/08/gams-speed-of-loops.html. There are some exceptions however. Here is an example:

image Note that map(i,k) has a diagonal structure.

The timings are:

method 1: loop 0.156 seconds
method 2: sum 3.495 seconds

Wednesday, October 20, 2010

Advanced basis/starting point in GAMS

Savepoint/loadpoint is an easy way to use an initial point for an NLP or advanced basis for an LP. The commands are:

image 

The first time you run this the solution is saved in the GDX file. The next time around, this GDX is used to set as many levels and marginals that can be matched by name. For this very large MCP model it turned out to be useful to bring down solution times of related experiments.

Sorry for the ugly syntax. I would have preferred something more symmetric for reading and writing.

Tuesday, October 12, 2010

Assertive student

Hi Erwin,

In short, I'm student and wish not to receive "ask your supervisor" answer.

Is there a function, like the error function, to compute the probability of random variable that has a beta distribution? if not, is there any way to statistically represent a bounded variable?

Thanks for you consideration,

This is probably related to http://yetanothermathprogrammingconsultant.blogspot.com/2010/09/student-mails.html. My hint in this case was: See http://www.amsterdamoptimization.com/pdf/specfun.pdf.

Monday, October 11, 2010

Dissemination of model results

After spending time and effort on a model, it may well be worth to pay attention to presentation and dissemination of the model results. Currently I am working on a project where we can export results for a very large and complex GAMS model to an Adobe Flash application called StatPlanet. Obviously a web-based tool to inspect results has lots of advantages (nothing to install, available from everywhere).

GDX2StatPlanet

Thursday, October 7, 2010

GAMS/GDX

GAMS has a binary data file format for which an API is available. In the last few weeks I encountered a number of similar issues with this in different projects.
The interface does require more knowledge of the internals of GAMS than most people have. An example is here. When we want to read a GAMS parameter A(I,J) into a C/Fortran/Delphi matrix A[1..m,1..n] we see this is not completely trivial. The basic obstacle is that GAMS has strings as indices, while most programming languages have simple integers as indices. I.e. we need to map J = /new-york, chicago, topeka/ into j=1..3. For large problems that means some kind of hashing or other way to quickly look up strings. There is also a “raw” mode interface that does not give strings but rather internal GAMS numbers. So we need to map J = /1001,1002,1005/ to j=1..3. This again is not a completely trivial task. If you are familiar with the internals of GAMS one can know that GAMS internal numbers (so-called UEL numbers) are by definition increasing. So one could use a binary search to quickly find numbers. A fortran routine for this can look like:
image
Another possibility would by to have an array of the size of MAXUEL that has mapping information from UEL-> j=1..3. That requires more space but has quick look up. If the GDX file contains only one set, one could use socalled “mapped” mode, but that cannot be used in general.

It is also worth noting that GAMS only stores and exports nonzero elements. So we cannot assume all A(i,j)’s are in the GDX file.

So to recap, my suggested approach to read a matrix A(i,j) from a GDX file is:

  1. Read 1D set I (in raw mode) and store UEL’s in an integer array IA
  2. Read 1D set J (in raw mode) and store UEL’s in an integer array JA
  3. Allocate array A(i,j) and set all elements to zero
  4. Read 2D parameter A (in raw mode). For each record do:
    1. Retrieve (ueli, uelj, aij)
    2. i := bsearch(IA, length(IA), ueli)
    3. j := bsearch(JA, length(JA), uelj)
    4. A[i,j] := aij

If you don’t want to use binary search an alternative algorithm would be:

  1. Retrieve MAXUEL (gdxSystemInfo)
  2. Allocate integer arrays IA and JA of length MAXUEL
  3. Read 1D set I (in raw mode). For each record do:
    1. Retrieve uel
    2. IA[uel] := recordnumber
  4. Read 1D set J into array JA as described in step 3
  5. Read 2D parameter A (in raw mode). For each record do:
    1. Retrieve (ueli, uelj, aij)
    2. i := IA[ueli]
    3. j := JA[uelj]
    4. A[i,j] := aij


Notes:

  • The generated interface files are quite ugly, have strange design and implementation and have no comments on usage. The code in the provided files just looks very awkward. Below are a few more specific issues illustrating this.
  • The filenames are incomprehensible: gdxdocpdef.pas, gdxdcpdef.pas ??? 
  • The functions should return boolean/logical values instead of integers.
  • The previous point is the more crucial as some routines have 0=success and others have a return code <>0 meaning success. That throws me off balance quite often.
  • Fortran support for many compilers is missing.
  • The examples are too simplistic. The above is not explained at all.
  • The generated interface contains some code that I would refuse from my students when I was teaching programming at undergraduate level. An example is:
       n: ShortString
    A symbol n should always be an integer, never a string (or a floating point number).
  • The Delphi interface does not compile under the latest compiler.
Delphi 2010

GAMS users routinely generate very large GDX files with millions of records. To get data in and out of these files efficiently still requires substantial skills and effort.

Sunday, October 3, 2010

Large NLP/MCP models

For a very large large MCP model (a system of nonlinear equations with complementarity conditions) we observed it became unsolvable, even with a very good starting point. The reason was that the model was kept as small as possible in terms of equations and variables. As a result the model was much more dense and nonlinear (in terms of nonlinear nonzero elements) than needed. Basically, modeling systems and solvers that exploit sparsity should be fed large, sparse models instead of corresponding smaller, denser models. After substituting out some common expressions into separate equations (hence increasing the size in terms of equations and variables) we could solve the larger models (≈400K variables and equations) fairly easily.

An linear example of such a problem is shown here: http://yetanothermathprogrammingconsultant.blogspot.com/2010/04/l1-portfolio-formulation.html.

An example of a nonlinear construct that should raise a red flag is something like:

imageIn many cases it is better to write this as:

imageThis will keep the number of nonlinear nonzero elements smaller at the expense of extra variables and equations.

In the model I was looking at a combination of these two situations was implemented: a large linear expression inside a nonlinear expression, and this nonlinear expression was repeated several times in the model. This caused both extra nonzero elements, and they appeared also in very nonlinear fashion.

For a smaller version of the model, the differences in statistics is striking (I was unable to get the large version to solve with the original formulation):

old model

MODEL STATISTICS

BLOCKS OF EQUATIONS          25     SINGLE EQUATIONS      106,219
BLOCKS OF VARIABLES          25     SINGLE VARIABLES      106,236
NON ZERO ELEMENTS     5,238,688     NON LINEAR N-Z      5,034,974
DERIVATIVE POOL           1,828     CONSTANT POOL         104,745
CODE LENGTH         120,759,139


GENERATION TIME      =       52.151 SECONDS

EXECUTION TIME       =       52.463 SECONDS

RESOURCE USAGE, LIMIT        281.806      1000.000
new formulation

MODEL STATISTICS

BLOCKS OF EQUATIONS          26     SINGLE EQUATIONS      107,605
BLOCKS OF VARIABLES          26     SINGLE VARIABLES      107,622
NON ZERO ELEMENTS       768,295     NON LINEAR N-Z        561,809
DERIVATIVE POOL           1,859     CONSTANT POOL         104,745
CODE LENGTH          14,348,068


GENERATION TIME      =        7.566 SECONDS

EXECUTION TIME       =        7.972 SECONDS

RESOURCE USAGE, LIMIT         57.216      1000.000

Unfortunately solvers and modeling systems don’t give much feedback on these issues. Such a diagnoses requires some modeling knowledge and experience (one reason why I am sometimes hired to help out).