Yet Another Math Programming Consultant: August 2008

Wednesday, August 27, 2008

Smooth approximation for ifthen()

The construct ifthen( x1 > x2, y, z) can be approximated by the smooth function:

z + (y-z)*0.5*(1+tanh(k*(x1-x2)))

where k is a large constant (e.g. k=1000). This will allow you to use NLP and CNS model types instead of DNLP.

See also: http://en.wikipedia.org/wiki/Heaviside_step_function

The logistic function (http://en.wikipedia.org/wiki/Logistic_function) can be expressed in two ways:

f(x,k) = (1+tanh(k·x))/2
f(x,k) = 1/(1+exp(-2k·x))

The exp() version can overflow more easily. Newer versions of GAMS implement a version of exp() that does not overflow (by truncation) so either choice should work.

This approximation can be used also for max(x,y) and min(x,y) as we have:

max(x,y) = ifthen(x>y, x, y)
min(x,y) = ifthen( y>x, x, y)

Tuesday, August 26, 2008

Invert.exe and Eigenvector.exe

> I have used these programs for years with release 22.1. Upgrading to
> 22.6, I cannot make them work.

Yes, GAMS has changed the GDX API causing a lot of compatibility problems. Invert.exe is now part of 22.8. Eigenvector.exe somehow is not. If you are a client of Amsterdam Optimization contact info@amsterdamoptimization.com and they can give you access to 22.6 versions of both.

Sunday, August 24, 2008

Minimum downtime

>Is there a chance to increase an index in a loop in gams?
>To illustrate, a(t) variable
> loop(t,
> if(a(t) eq 0,
> a(t+1) =0;
> t=t+2;);
> I want a(t) makes a(t+1) zero but dont want a(t+1) makes a(t+2) zero. Thus,I should skip t+1 step in the loop.

This approach is doomed. Remember for a MIP you need to formulate linear constraints. You need to introduce binary variables to model a construct like this. The following may work:

The general idea is to use binary variables say
x(t) = 0 if a(t) =0,
x(t) = 1 otherwise
(this is fairly simple to do) and then to forbid the x pattern
... 1 0 1 ....
e.g. by a cut of the form x(t) - x(t+1) + x(t+2) ≤ 1;

This type of construct is often used in unit commitment modeling in power planning: you don't want to turn on and off generators all the time. A minimum downtime restriction is often used to prevent this from happening.

You may want to consider hiring a consultant to speed up your modeling efforts. Someone with experience can implement these constructs quickly and reliably, while it can be a long struggle for someone who has never done this before, as you have experienced with your efforts.

Note: I suggested to use binary variables in an earlier email exchange, but based on advise from people on the GAMS-L mailing list the user stayed with the loop construct. Largely because the questions posed were too much focused on narrow loop syntax issues, nobody said: this is wrong, you need to change course. If the user would have provided more background info, he would probably have been steered to more promising and time-tested approaches.

Note 2: Don't use ifthen() to model x. That causes the model to become an MINLP with discontinuous relaxations, which is really bad! This can be done in a linear fashion. Assuming a(t)≥0 we can write:
x(t) ≤ a(t)*M;
x(t) ≥ a(t)/a.up(t);

NNLS

> Hi!
> In a least squares fitting application, I'm trying to determine the parameters x_1, x_2
> and x_3 so that the function
> f(x_1, x_2, x_3) = \sum_{i=1}^N (a_i*x_1 + b_i*x_2 + c_i*x_3 - d_i)^2
> is minimized under the constrains x_1, x_2, x_3 >= 0 (LaTeX notation)
> I was pointed to the keyword "nonnegative least squares" (NNLS), but wasn't able to
> find out much about this: Lawson, Hanson: "Solving Least Squares Problems", Ch. 23
> Briggs: "High Fidelity Deconvolution of Moderately Resolved Sources", Ch. 4.3
> Can you recommend further literature or web resources on this topic, ideally with examples
> on how to solve this type of problem?

Any NLP solver can solve these type of problems quite efficiently. See e.g.
http://www.princeton.edu/~rvdb/ampl/nlmodels/nnls/index.html
A GAMS version can look like:

   1  set
   2    i /1*1000/
   3    j /1*300/
   4  ;
   5   
   6  parameters
   7     b(i)
   8     A(i,j)
   9  ;
  10   
  11  a(i,j)$(uniform(0,1) < 0.21) = 10*(uniform(0,1)-1);
  12  parameter x0(j);
  13  x0(j) = uniform(0,1);
  14  b(i) = sum(j, a(i,j)*x0(j)) + 100*(uniform(0,1)-0.5)
  15   
  16  positive variable x(j);
  17  variable sum_sqs;
  18   
  19  equation esum_sqs;
  20   
  21  esum_sqs.. sum_sqs =e= sum(i, sqr[ b(i) - sum(j, A(i,j)*x[j]) ]);
  22   
  23  model m /all/;
  24  solve m minimizing sum_sqs using nlp;
  25   
  26   
  27  *conopt : 0.702 seconds
  28  *minos :  1.264
  29  *snopt :  0.343
  30  *knitro : 7.385
  31  *ipopt :  1.088

Friday, August 15, 2008

GAMS documentation: endogenous floor

> I would like to use floor() in my equations but
> the documentation says "not allowed".

The documentation is wrong. It can be used if the model type is DNLP.

Both the GAMS user guide and the McCarl guide have exactly identical errors in their descriptions of GAMS functions. The functions trunc(x), sign(x), ceil(x), mod(x,y), floor(x) and round(x) are all allowed in DNLP models opposed to what is stated.

Note that DNLP models are treacherous. In many case you may want to consider alternative formulations such as a smooth approximation or a formulation using binary variables. Specifically the floor(x) function can be modeled as:

integer variable ifloor;
ifloor ≤ x
ifloor ≥ x - 0.999

Monday, August 11, 2008

Sorting in GAMS

Although there is an external tool that can help, sorting in GAMS remains somewhat problematic. In this case I needed to sort each column in a matrix p(i,j). The length of each column is 10-15 while the number of columns can be hundreds. To confirm I had a reasonable implementation I tried for implementations:

Writing sorting directly with loops
Use smin with other loops
Use card(j) times a call to gdxrank
Make a very large one dimensional vector and sort that

   1  set
   2    i /i1*i15/
   3    j /j1*j1000/
   4  *  i /i1*i5/
   5  *  j /j1*j2/
   6  ;
   7  
   8  
   9  alias(i,ii);
  10  
  11  parameter p(i,j);
  12  p(i,j) = uniform(0,1);
  13  
  14  parameter timing(*);
  15  scalar t;
  16  
  17  parameter order1(i,j),p1(i,j);
  18  p1(i,j) = p(i,j);
  19  *display p;
  20  
  21  t = jnow;
  22  
  23  scalar pmin;
  24  loop(j,
  25  
  26     loop(ii,
  27        pmin = 9999999.99;
  28        loop(i$(p1(i,j)<pmin),
  29           order1(ii,j) = ord(i);
  30           pmin = p1(i,j);
  31        );
  32        p1(i,j)$(order1(ii,j)=ord(i)) =  9999999.99;
  33     );
  34  
  35  );
  36  
  37  timing('loop algorithm') = (jnow-t)*24*60*60;
  38  
  39  t = jnow;
  40  
  41  parameter order2(i,j),p2(i,j);
  42  p2(i,j) = p(i,j);
  43  *display p;
  44  
  45  scalar pmin,first;
  46  loop(j,
  47  
  48     loop(ii,
  49        pmin = smin(i, p2(i,j));
  50        first = 1;
  51        loop(i$(p2(i,j)=pmin and first),
  52           order2(ii,j) = ord(i);
  53           first = 0;
  54        );
  55        p2(i,j)$(order2(ii,j)=ord(i)) =  9999.9999;
  56     );
  57  
  58  );
  59  
  60  timing('loop/smin algorithm') = (jnow-t)*24*60*60;
  61  
  62  t = jnow;
  63  
  64  parameter order3(i,j),psort(i),pindex(i);
  65  
  66  loop(j,
  67     psort(i) = p(i,j);
  68     execute_unload "rank_in.gdx", psort;
  69     execute "=gdxrank rank_in.gdx rank_out.gdx";
  70     execute_load  "rank_out.gdx", pindex=psort;
  71     loop((i,ii)$(pindex(i)=ord(ii)),
  72         order3(ii,j) = ord(i);
  73     );
  74  );
  75  
  76  timing('gdxrank loop algorithm') = (jnow-t)*24*60*60;
  77  
  78  t = jnow;
  79  
  80  set k /1*15000/;
  81  parameter psort4(k),pindex4(k),order4(i,j);
  82  scalar pmax,pmin;
  83  pmax = smax((i,j),p(i,j));
  84  pmin = smin((i,j),p(i,j));
  85  set pmap(i,j,k);
  86  pmap(i,j,k)$(ord(k)=card(i)*(ord(j)-1)+ord(i)) = yes;
  87  *display pmap;
  88  
  89  psort4(k) = sum(pmap(i,j,k), [p(i,j)-pmin]/(pmax-pmin)+ord(j));
  90  *display p,psort5;
  91  execute_unload "rank_in.gdx", psort4;
  92  execute "=gdxrank rank_in.gdx rank_out.gdx";
  93  execute_load  "rank_out.gdx", pindex4=psort4;
  94  
  95  loop(j,
  96     pindex(i) = sum(pmap(i,j,k),pindex4(k)) - (ord(j)-1)*card(i);
  97  *   display pindex;
  98     loop((i,ii)$(pindex(i)=ord(ii)),
  99         order4(ii,j) = ord(i);
 100     );
 101  );
 102  
 103  timing('big gdxrank algorithm') = (jnow-t)*24*60*60;
 104  
 105  
 106  *display p,order1,order2,order3,order4;
 107  
 108  display timing;

The timing results are:

----    108 PARAMETER timing 
 
loop algorithm          0.641,    loop/smin algorithm     0.867,    gdxrank loop algorithm 19.462
big gdxrank algorithm  54.200

Note that for the last example we spend all time in the calculation of the pmap set.

Conclusion: GAMS would benefit from a built-in sorting facility.

Sunday, August 10, 2008

GAMS bug in ifthen

   1  scalar a,b,c;
   2  a=1;
   3  b=0;
   4  c = ifthen(b<=0,1,a/b);
   5
   6  display a,b,c;

This gives:

**** Exec Error at line 4: division by zero (0)
**** Exec Error at line 4: IfThen: cannot be called with special value UNDF in arg3

Work-around: don't use ifthen in cases like this, but something like:

*
* alternative 1: split into two parts
*
c=1;
c$(b>0) = a/b;
*
* alternative 2 :use an extra condition inside the ifthen
*
c = ifthen(b<=0,1,[a/b]$b);
* 
* alternative 3: use an if statement
*
if(b<=0,
   c=1;
else
   c=a/b;
);

Some of these reformulations may be slower. So remember ifthen does not do what one would expect e.g. when comparing with AMPL's if-then-else: both branches are evaluated and can lead to runtime errors. I believe this can be fixed by a smarter implementation of ifthen. That would both benefit the usefulness, predictability and performance of this construct.

Note: ifthen has similar limitations when used inside NLP equations. Unfortunately it cannot be used to prevent domain errors etc.

I hit this problem when automatically translating Excel's if() function to GAMS. Excel implements the if() as one would expect and unfortunately this does not translate directly into the GAMS ifthen() function.

gdxrank help

> gdxrank does not sort my array correctly.

Most likely you have a zero in your array. GDX does not store zero's so your parameter will arrive incomplete.

Work-around: replace zero's by EPS before sorting.

Note: it would probably be better if sorting was an internal GAMS command. Alternatively Gdxrank could be fixed by allowing to include a domain set to the gdx file (i.e. export both i, p(i)).

Saturday, August 9, 2008

Loops are slow in GAMS

When optimizing GAMS code it is always worthwhile to see if you can remove loops. Here is an example of an if,then,else construct:

   1  set i /i1*i200/;
   2
   3  alias (i,j,k);
   4  parameter a(i,j,k),b(i,j,k),c(i,j,k);
   5  a(i,j,k) = uniform(0.1,1);
   6  b(i,j,k) = uniform(0.1,1);
   7  c(i,j,k) = uniform(0.1,1);
   8
   9  parameter p1(i,j,k),p2(i,j,k),p3(i,j,k);
  10
  11  * this is fast
  12  p1(i,j,k) = ifthen(b(i,j,k)>0.5,a(i,j,k)/b(i,j,k),c(i,j,k));
  13
  14  * this is slow
  15  loop((i,j,k),
  16     if(b(i,j,k)>0.5,
  17        p2(i,j,k) = a(i,j,k)/b(i,j,k)
  18     else
  19        p2(i,j,k) = c(i,j,k)
  20     );
  21  );
  22
  23  * this is fast but I feel uneasy about assumption that
  24  * these conditions are quaranteed mutual exclusive
  25  * for floating point numbers
  26  p3(i,j,k) =  [a(i,j,k)/b(i,j,k)]$(b(i,j,k)>0.5) + c(i,j,k)$(b(i,j,k)<=0.5);

The profiling info clearly indicates the loop is the slowest alternative:

----      1 ExecInit                 0.000     0.000 SECS      3 Mb
----      5 Assignment a             1.762     1.762 SECS    261 Mb  8000000
----      6 Assignment b             1.748     3.510 SECS    518 Mb  8000000
----      7 Assignment c             1.747     5.257 SECS    775 Mb  8000000
----     12 Assignment p1            3.307     8.564 SECS  1,032 Mb  8000000
----     15 Loop                    22.027    30.591 SECS  1,288 Mb
----     26 Assignment p3            3.900    34.491 SECS  1,545 Mb  8000000

I prefer the ifthen construct as this has a real 'else'. The assignment to p3 uses two different $ conditions, which makes me feel uneasy.

GAMS bug in ifthen

   1  set k /k1,k2/;
   2  
   3  parameter p(k);
   4  
   5  p(k) = ifthen(sameas(k,'k1'),1,2);
****                              $201
**** 201  Invalid argument for function
   6  
   7  display p;

Work-around: use 1$sameas(k,'k1'). Note that if() allows a condition like this.

Slow division in GAMS

Divisions can be very slow in GAMS. The reason is that GAMS cannot skip zero's 0/x as x may trigger a special value in case x=0. There are some easy faster reformulations:

   1  set i /i1*i300/;
   2 
   3  alias (i,j,k);
   4 
   5  parameter a(i,j,k) /
   6    i1.i1.i1 1
   7  /;
   8 
   9  parameter b(i);
  10  b(i) =  0.5;
  11 
  12  * slow division
  13  parameter c0(i,j,k);
  14  c0(i,j,k) = a(i,j,k)/b(i);
  15 
  16  * fast division (method 1)
  17  parameter c1(i,j,k);
  18  c1(i,j,k)$a(i,j,k) = a(i,j,k)/b(i);
  19 
  20  * fast division (method 2)
  21  parameter b2(i);
  22  b2(i) = 1/b(i);
  23  parameter c2(i,j,k);
  24  c2(i,j,k) = a(i,j,k)*b2(i);

The profile information shows:

----      1 ExecInit                 0.000     0.000 SECS      3 Mb
----     10 Assignment b             0.000     0.000 SECS      4 Mb    300
----     14 Assignment c0            3.292     3.292 SECS      4 Mb      1
----     18 Assignment c1            0.000     3.292 SECS      4 Mb      1
----     22 Assignment b2            0.000     3.292 SECS      4 Mb    300
----     24 Assignment c2            0.000     3.292 SECS      4 Mb      1

I.e. the original assignment c0 is slow, but assignments c1 and c2 are fast. The first assignment to c0 is running over all (i,j,k). In assignment c1 we force to only consider the single nonzero element in a(i,j,k). In assignment c2 we multiply by the reciprocal. Multiplication is fast as it will skip any nonzero a(i,j,k).

Friday, August 8, 2008

xls2gms

>Dear Erwin, I am currently working with GAMS to build a village CGE model.
>Since all my data is in excel, I need the above program to convert the excel data
>to GAMS readable file. After going through the documentation in the GAMS web site,
>I realized that the program is no longer available though there are alternatives to it.
>But I felt that XLS2GMS is flexible and that the GAMS programs that I currently
>have used this program. Hence would you please be kind enough to email me this program.

It is included in e.g. GAMS 22.8. GAMS 22.8 can be downloaded from
http://download.gams-software.com/.
To see if your gams system has it, run the model:

$call =xls2gms

If you have problems with it please contact support@gams.com.

Thursday, August 7, 2008

GAMS and Very Large Models

I am currently working on a very large GAMS model (about 15k lines of code, no big include files with data, just small LP models but very many of them). Some suggestions for improvements in GAMS that could help me with with a model this size:

IDE: quick way to go to declaration of an identifier (right mouse click on symbol)
Run time errors: quick way to jump to place in source code where this happened
Allow scalar,parameter,set declarations inside a loop (most modern programming languages allow this)
Allow display of current loop element (loop(i, display i;) now gives the whole set which is not very useful)
Do not allow redeclaration of symbols (so I get a warning if it is already used)
Better navigation of listing file if we have displays in nested loops
Some interactive debugging facilities (breakpoint, watches etc.)
More flexible limrow/limcol feature
Some string facilities
Very large log files: make it easy to find first, next syntax error

Some of these features are not easy to implement, but others should not be too difficult.

Wednesday, August 6, 2008

How write a TSP model in GAMS

> My TSP model does not work: it has sub-tours

The TSP (Traveling Salesman Problem) is not complete trivial to write as a MIP.
Direct MIP models are only suited for very small problem. For larger models
you need more advanced strategies.

You can see some examples in the model library all written by me: tsp1, tsp2, tsp3, tsp4, tsp42 (the listed author is incorrect: I wrote this model). The last one is actually an algorithm that adds subtour elimination cuts to a relaxation.

For more information see the following papers and posts:

Gabor Pataki (2003), “Teaching Integer Programming Formulations Using the Traveling Salesman Problem”, SIAM Review, Vol. 45, No. 1, pp. 116-123, http://www.unc.edu/~pataki/papers/teachtsp.pdf
A.J. Orman & H.P. Williams(2004), “A Survey of Different Integer Programming Formulations of the Travelling Salesman Problem,” The London School of Economics and Political Science, http://www.lse.ac.uk/collections/operationalResearch/pdf/Working%20Paper%2004.67.pdf
Lee J., and J.F. Raffensperger (2006), "Using AMPL for teaching the TSP," INFORMS Transactions on Education, Vol. 7, No 1,
http://archive.ite.journal.informs.org/Vol7No1/LeeRaffensperger/
An MS-Access based front-end for tsp42 is described in how-to-call-gams-from-access.
More GAMS examples based on Burma-14 dataset from TSPLIB: powerset example.
Drawing TSP tours in GAMS: post.

Friday, August 1, 2008

Request: Alias for multi-dimensional sets

In current GAMS versions we can only use ALIAS on single dimensional sets. However it would be an obvious improvement to relax this restriction. Here is an example where I needed an extra set to handle a simple mapping:

set mapper(*,j) /
  x1 . const
  x2 . Z1
  x3 . Z2
  x4 . Z3
  x5 . Z4
  x6 . Z5
  x7 . Z6
  x8 . Z7
  x9 . Z8
/;
 
* we cannot alias 2 dimensional set so we create two
set mapper2(*,j);
mapper2(mapper)=yes;
 
h0(j,jj) = sum((mapper(hi,j),mapper2(hj,jj)), h('e1',hi,hj));

It would be logical to use something like:

alias(mapper,mapper2);

Tobit models

Here are two versions of a well known Tobit regression model by Fair. It uses the hessian facilities and the invert tool to calculate standard errors and t-statistics for the maximum likelihood estimates.

tobithessian.gms
tobithessian2.gms