## Saturday, November 28, 2009

### Exponential Smoothing

Someone mentioned a method to get an optimal value for parameter α in the exponential smoothing algorithm (http://en.wikipedia.org/wiki/Exponential_smoothing). Looking at http://shazam.econ.ubc.ca/intro/expsmth1.htm this probably very often gives α=1. On pure intuition, I suspect this has something to do with the missing constant term in the corresponding ARIMA(0,1,1) model. In GAMS we can reproduce this as:

\$ontext

GAMS code to reproduce

http://shazam.econ.ubc.ca/intro/expsmth1.htm

\$offtext

sets
t
/1931*1960/
t1(t)
/1931*1959/
;

parameter sales(t) /

1931   1806

1932   1644

1933   1814

1934   1770

1935   1518

1936   1103

1937   1266

1938   1473

1939   1423

1940   1767

1941   2161

1942   2336

1943   2602

1944   2518

1945   2637

1946   2177

1947   1920

1948   1910

1949   1984

1950   1787

1951   1689

1952   1866

1953   1896

1954   1684

1955   1633

1956   1657

1957   1569

1958   1390

1959   1387

1960   1289
/;

variables
f(t)
sse
w
;

w.lo = 0.0;
w.up = 1.0;
w.l  = 0.8;

equations
deff(t)
defsse
;

deff(t).. f(t) =e= sales(t)\$(
ord(t)=1) +
[w*sales(t)+(1-w)*f(t-1)]\$(
ord(t)>1);

defsse.. sse =e=
sum(t1(t),sqr(f(t)-sales(t+1)));

model m/all/;
solve m minimizing sse using nlp;

parameter result;
result(
'opt','w') = w.l;
result(
'opt','SSE') = sse.l;

set i/i1*i4/;
parameters wx(i) /

i1 0.8

i2 0.6

i3 0.4

i4 0.2

/;
loop(i,
w.fx = wx(i);

solve m minimizing sse using nlp;
result(i,
'w') = w.l;
result(i,
'SSE') = sse.l;
);
display result;

Indeed this gives the following result:

----     90 PARAMETER result

w         SSE

opt       1.000 1222283.000
i1        0.800 1405768.565
i2        0.600 1761367.108
i3        0.400 2421085.784
i4        0.200 3570106.711

Note: the initial value for w (w.l  = 0.8;) is important. If we leave that statement out, CONOPT will not be able to move w away from its default initial value of zero.

## Monday, November 23, 2009

### Stochastic Forecast with GAMS

A client of mine asked me to see if we could implement some stochastic forecasting scheme in GAMS and incorporate this into their model. A SAS program to calculate stochastic forecasts was provided, and it was easy to translate this into GAMS:

sets
t
/1993*2050/
tr(t)
'used in regression' /1993*2009/
tf(t)
'used in forecast' /2010*2050/
;

parameter yield(t) /

1993    4.166666667

1994    4.454901961

1995    4.111111111

1996    4.558823529

1997    6.097637795

1998    5.182341651

1999    5.548387097

2000    5.464868701

2001    6

2002    6.326530612

2003    6.52173913

2004    7.374100719

2005    6.475409836

2006    8.035714286

2007    6.395705511

2008    7.003890813

2009    7.339368303

/;

parameter year(t);
year(t) = 1992+
ord(t);

*-----------------------------------------------------------------------------
* regression
*-----------------------------------------------------------------------------

variables
sse
b0
'constant term'
b1
;
equations
dummy
fit(tr)
;

dummy.. sse =n= 0;
fit(tr).. yield(tr) =e= b0 + b1*year(tr);

option lp=ls;
model ols /dummy,fit/;
solve ols using lp minimizing sse;

* get standard error of regression
* aka Root MSE in SAS
scalars rmse;
display rmse;

*-----------------------------------------------------------------------------
* forecast
*-----------------------------------------------------------------------------

* this if you want every run to be different
* execseed = 1+gmillisec(jnow);

parameter
statfcst(t)
'static forecast'
stochfcst(t)
'stochastic forecast'
;
statfcst(tr) = yield(tr);
statfcst(tf) = b0.l + b1.l*year(tf);

stochfcst(tr) = yield(tr);
stochfcst(tf) = b0.l + b1.l*year(tf) + rmse*normal(0,1);

display statfcst,stochfcst;

*-----------------------------------------------------------------------------
* Monte Carlo Simulation
*-----------------------------------------------------------------------------

set r 'replications' /1*10000/;
parameter allyield(r,tf);
allyield(r,tf) = b0.l + b1.l*year(tf) + rmse*normal(0,1);

parameter
stochfcst2(t)
'stochastic forecast'
;

stochfcst2(tr) = yield(tr);
stochfcst2(tf) =
sum(r, allyield(r,tf))/card(r);
display stochfcst2;

It is not very difficult to incorporate a SOLVE in the last part. We need a real loop over r for that. The code can look like:

parameter result(r,t) 'used to collect model solutions';
loop(r,
yield(tf) = b0.l + b1.l*year(tf) + rmse*normal(0,1);
* do something with yield(t) in model

solve gams_model using nlp minimizing z;
* collect solutions and yields
allyield(r,t) = yield(t);
result(r,t) = x.l(t);
);
* get means
parameter myield(t),mresult(t);
myield(t) =
sum(r,allyield(r,t))/card(r);
mresult(t) =
sum(r,result(r,t))/card(r);

GAMS has some cool looking fan-charts, may be I can use this in this project!

References:

### Debug unbounded models

To test a high precision linear optimization solver,
the following problem has been presented:

M
Maximize: f(X) =     Σ [2+sin(m+Pi/m)]Xm
m=1

Constraints: b(n) = M[2+sin(n+Pi/n)]  ;  n = 1,2,..,N

M
Cost Matrix:     Σ sin[m+Pi/(m+n)]Xm  <=  b(n)
m=1

Xm  >= 0 ,   M <= N ,  Pi = 3.14...

For M = N = 256, the resulting (extended-precision) maximum
is
f(X) = 0.3711087372584210920E+10

Using an AMD Athlon 64 CPU with a clock rate of 2600 MHz,
the calculation time - including extended precision data input -
is 3.0 sec.

Could someone verify this result?

You may need to do more debugging. The GAMS model with any LP solvers shows unbounded. A good way to debug is to add a bound on the objective variable so we can look at a solution:

set
i
/i1*i256/
j
/j1*j256/
;
parameter m(i); m(i)=ord(i);
parameter n(j); n(j)=ord(j);

parameter b(j);
b(j) =
card(i)*(2+sin(n(j)+pi/n(j)));

positive variable x(i);
variable z;

equations obj,e(j);

obj.. z =e=
sum(i, (2+sin(m(i)+pi/m(i)))*x(i));
e(j)..
sum(i, sin(m(i)+pi/(n(j)+m(i)))*x(i)) =l= b(j);

model mlp /all/;
z.up=1e15;
solve mlp maximizing z using lp;
display x.l;

This can show:

----     22 VARIABLE x.L

i25 5.01776E+14

(other solvers may give a different solution). All other elements are zero. Indeed for this x(25) we have an objective coefficient greater than zero: 2 + sin(x) is always possible. The coefficients for x(25) for the constraints are all negative. This can be seen from:

You can verify by clicking on: http://www.wolframalpha.com/input/?i=plot+sin(25%2Bpi/(25%2Bn))+for+n%3D1+to+256.

So indeed when we make x(25) larger the objective will improve while the constraints do not limit this move.

Notes:

• A standard text book full-tableau lp solver using extended precision will still be numerically unstable.
• Gurobi has a “quad” option to use extended precision.

## Sunday, November 22, 2009

### Some GAMS/Gurobi message issues

A client of mine purchased GAMS/Gurobi, and tried his license on a small LP problem and immediately sent me an email with “help my license does not work”. The output showed:

Gurobi library version 2.0.1

That looks like the license was not recognized. Turns out that for small models (within the demo limits) this message is issued. This is really confusing.

Solved in 0 iterations and 0.00 seconds
Infeasible or unbounded model
LP status(4): Model was proven to be either infeasible or unbounded.

Come on, make up your mind! For a modeler it makes a lot of difference if the model is infeasible or unbounded and when writing algorithms it is even more important to know if the sub-model was infeasible or unbounded. I would prefer the solver (by default) spends some more time (e.g. by restarting using a primal algorithm) to give a proper diagnostic message.

Even worse, the listing file mentions:

**** SOLVER STATUS     1 Normal Completion
**** MODEL STATUS      19 Infeasible - No Solution
**** OBJECTIVE VALUE                0.0000

which is wrong: the model is unbounded! Also if the model were infeasible it would be good to report a solution with minimum infeasibility or some other hints: just “no solution” is not very helpful for a modeler.

Note: these issues are mostly not related to Gurobi per se, but rather to the GAMS link implementation.

## Tuesday, November 17, 2009

### Assignment Problem

In http://thread.gmane.org/gmane.comp.lang.r.general/169477/focus=169690 an assignment problem is formed and solved with the Hungarian method in R:

Current LP solvers are very fast: when we solve this with Gurobi we see:

Presolve time: 0.34 sec.
Iteration    Objective       Primal Inf.    Dual Inf.      Time
0    0.0000000e+00   1.000000e+03   0.000000e+00      0s
8329    7.4610203e+02   3.115000e+03   0.000000e+00      5s
11696    7.5580772e+02   0.000000e+00   0.000000e+00      9s

Solved in 11696 iterations and 9.30 seconds

I.e. about three times as fast. So not just any specialized algorithm can beat an excellent general purpose LP solver at all times.

Note: for n=1000 the timings for R are:

and for Gurobi:

Solved in 40291 iterations and 141.13 seconds

## Monday, November 16, 2009

### Modeling after collecting data

I am involved in a policy evaluation project where lots of effort has been spent on collecting data. Now that we are actually starting to run optimization models using this data, the solutions give rise to other questions. That happens a lot: the solutions of first versions of a model, do not really answer questions. They more often give rise to new questions, and show that some questions are not really well-posed or even relevant. As a result some of the data suddenly seems less important, and at the same time, some new data would be really useful to solve models that help answering new questions. The lesson is really, that it is often a good idea not to delay building the model until you have all the data. Instead try to build early versions of the model as soon as possible, in conjunction with the data collection effort. This can mean we need to “invent” data. This is a useful effort in itself: it requires to form some detailed understanding of a problem that really helps down the road. Building models can help focus the mind on what is important, what we don’t know yet, and what are relevant questions to ask. It helps to pose these issues more sharply than is otherwise possible. It is very difficult to think about all these things in an abstract way: the model helps to identify problems succinctly.

## Sunday, November 15, 2009

### NLP model

`> 1) Is there any nonlinear programming optmizer that I can user for the> following problem?> > Obj function: (max) revenue = price * volume> Constraints: price and volume pair must be from the following variable data set:> > Variable data set:> # price volume> 1 10 500> 2 20 450> 3 30 330> 4 40 250> 5 50 190> > Expected result: 10,000 (variable row#4)`

`You are confused. That is not an nonlinear programming problem. You can just run through the data set and pick the largest product.`

## Wednesday, November 11, 2009

### Persistency in math programming models

For a model I am working on we need a user settable degree of persistency: solutions should take into account historic solutions and stay “close” to those in some sense. Basically: wildly different solutions are unwanted. An excellent paper on this subject is by the NPS people: http://faculty.nps.edu/dell/docs/optimization_and_persistence.pdf.

## Monday, November 9, 2009

### NEOS statistics

At http://neos.mcs.anl.gov/neos/report.html you can ask for statistics of the usage of the NEOS solvers (http://neos.mcs.anl.gov/neos/).

I ran the statistics for a year and a few interesting and unexpected patterns showed up.

The top 5 solvers:

List of Solvers
Solver Name # of Submissions
KNITRO 22498
bpmpd 20185
MINOS 15932
MINTO 14769
SNOPT 12916

is dominated by NLP solvers (knitro, minos, snopt). It may be possible that some LP’s are solved with NLP solvers (I see that sometimes also happening in client models). I had to google what bpmpd was (see http://www.sztaki.hu/~meszaros/bpmpd/), and I am surprised it ended up so high. It is nice to see good old MINOS ending up prominently in the top 5.

If we look at the top 5 input formats we see:

List of Inputs
Solver Input # of Submissions
AMPL 139782
GAMS 63873
CPLEX 3251
Fortran 3176
MPS 2940

Here we see AMPL as the winner with GAMS having half the submissions. This could well be explained by having most submissions coming from educational institutions, where AMPL is most likely more popular than GAMS as its format is closer to a mathematical formulation. It is good to see that almost all submissions are done in modeling languages: users should stay away from programming languages to build models unless there are significant reasons otherwise. Also note that Fortran seems to be more popular than C (at rank 7).

List of Categories
Solver Category # of Submissions
nco 65532
milp 49524
lp 32231
minco 28732
kestrel 15474

I assume that nco means non-linear constrained optimization. This confirms that many NLP models are solved on NEOS. I would guess minco means MINLP.

I certainly would not have predicted some of these numbers!