Yet Another Math Programming Consultant: January 2009

Friday, January 30, 2009

All-different

> Hi I am trying to solve a certain linear integer problem with lp_solve.
> I know that the solution set (24 variables) is (1,2...,24) and I am trying to figure out a way to input to lp_solve this array in order to get a feasible solution.

> In other words is it possible to put have a (or a set) of constraints saying that x1 ,x2,....,x24 belong to (1 ,2,...24) set but they are distinct (for instance x1=24, x2=3,x3=21 etc...).

This is called the ALL-DIFFERENT constraint. See here for an example.

One way of modeling this as a MIP is:

binary variable p(i,j) (i,j=1..24)

sum(i, p(i,j)) = 1; for all j

sum(j, p(i,j)) = 1; for all i

a(i) = i (constant)

x(i) = sum(j, p(i,j)*a(j))

A tighter formulation would be:

sum(i, x(i)) = 0.5*n*(n+1), where n=24

sum(j in J, x(j)) ≥ 0.5*card(J)*(card(J)+1) where J is an element of the power set of {1,..,24} with card(J) < 24.

See this presentation.

Regular Expressions in GAMS/IDE

I had to add a ';' to each line in a file. In vi (unix editor) one would do:

1,$s/$/;/

In the GAMS IDE this works slightly different:

replace: .*$
by: $0;

As an aside: the equivalent of 1,$s/^/;/ is not obvious to me (there may be an issue here if we use ;$0 as replacement string).

Thursday, January 29, 2009

AMPL NLP question

> I want to formulate some equations like ln(z(k)) <= D,
> z(k) is a variable. Is there some function or keyword used as logarithm?
> Or is there some method to calculate logarithm? Thanks

You can use the log() function. Note that if D is a parameter the constraint can better be formulated as z[k] <= exp(D). This makes it a simple linear constraint that can be converted into a bound.

The difference can be illustrated with a small example:

ampl: var x := 1;
ampl: param D := 3;
ampl: maximize obj: x;
ampl: e: log(x) <= D;
ampl: solve;
MINOS 5.5: optimal solution found.
11 iterations, objective 20.08553692
Nonlin evals: constrs = 42, Jac = 41.
-----------------------------------------
ampl: var x := 1;
ampl: param D := 3;
ampl: maximize obj: x;
ampl: e: x <= exp(D);
ampl: solve;
MINOS 5.5: optimal solution found.
1 iterations, objective 20.08553692

AMPL has a very good presolver, but not so good this reformulation is applied automatically.

Gurobi vs Cplex benchmarks

> What are your first impressions of Gurobi? How does it compare to CPLEX for MIP? Have you done any benchmark analysis?

Here are some benchmarks: http://plato.asu.edu/ftp/milpc.html

However I would always urge you to try out solvers on your particular models as they may behave very differently than the benchmark models. Drop me a note if you need me to run some models or need access to evaluation systems.

Wednesday, January 28, 2009

Cplex log: number of integer variables

Sometimes progress logs pose some interesting questions. In the Cplex log below we solve a problem with 2016 binary variables (after presolving). However the progress log seems to indicate that when we start 2036 of them are integer infeasible.

Reading data...
Starting Cplex...
Tried aggregator 1 time.
MIP Presolve eliminated 45077 rows and 15073 columns.
MIP Presolve modified 504 coefficients.
Aggregator did 1008 substitutions.
Reduced MIP has 73884 rows, 26656 columns, and 201936 nonzeros.
Reduced MIP has 2016 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time =    0.45 sec.
Clique table members: 900.
MIP emphasis: hidden feasible solutions.
MIP search method: dynamic search.
Parallel mode: opportunistic, using up to 4 threads.
Parallel mode: opportunistic, using up to 4 threads for concurrent optimization.
Using devex.
Using devex.
 
Total real time on 4 threads =   16.18 sec.
Root relaxation solution time =   16.19 sec.
 
        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer     Best Node    ItCnt     Gap
 
      0     0        0.0000  2036                      0.0000       20

Tuesday, January 27, 2009

Adding redundant constraint to MIP

Whenever a MIP model is very slow I try to invent "redundant" constraints that can be added to the model. Here is an example with Cplex. We report number of nodes needed to prove optimality on some small examples of a (proprietary) scheduling model:

case	with	without
8-3-2	23	34
8-6-2	0	7290
9-3-3	0	81
9-4-3	0	180
9-5-3	74	87738

As can be seen the version with the redundant equation added has much better performance. Furthermore it is noted that the duality gap was much smaller.

Monday, January 26, 2009

Scheduling: limits on number of periods active.

> How to model that an employe must work at least as 5 successive periods and at most as 8 successive periods with mathprog.

There are a few ways to model this, and without knowing more about the model it is difficult to give definite advise, but here is a suggestion of a possible formulation taken from a machine scheduling model I developed a little while ago:


param MAXT := 25;
param MIN := 5;
param MAX := 8;
set T := {1..MAXT};
 
var x{T}, binary;
var SwitchOn{T}, binary;
var SwitchOff{T}, binary;
 
 
 switch1{t in T: t > 1}: x[t]-x[t-1] = SwitchOn[t]-SwitchOff[t];
 switch2{t in T}: SwitchOn[t]+SwitchOff[t] <= 1;
 
 init1: x[1] = 0;
 init2: SwitchOn[1] = 0;
 init3: SwitchOff[1] = 0;
 
 
 minimum{t in T}: sum{i in 1..MIN:t+i-1<=MAXT} x[t+i-1] >= MIN*SwitchOn[t];
 maximum{t in T:t+MAX<=MAXT}: x[t+MAX] <= 1-SwitchOn[t];

I ignored the details about what to do with the boundaries. The last condition adds also an implied restriction on when the machine can be turned on again (after 8 periods), so it may be better to model this as:

maximum{t in T:t+MAX<=MAXT}: sum{i in 1..MAX+1:t+i-1<=MAXT} x[t+i-1] <= MAX + 1 - SwitchOn[t];

For multiple persons j we would need x[j,t], SwitchOn[j,t], SwitchOff[j,t]. A different way would be to organize x[j,t,len] where t is the start time and len is the length of occupation (len=5,6,7,8). This approach has more integer variables but may solve fast depending on the situation.

Friday, January 23, 2009

Excel Solver Sensitivity Analysis

> How do I programmatically tell Solver to generate a sensitivity report

> after running SolverSolve ?

This can be done through the SolverFinish function.

Thursday, January 22, 2009

Updated lp2gams

Updated lp2gams to distinguish names with different casing. In lpsolve a variable x is different from a variable X while in GAMS they are the same. We added some code to make sure the generated GAMS names are unique. If needed we add an underscore to the name.

Wednesday, January 21, 2009

Cplex barrier results

With some simulated data for the problem described in this post, we achieved some very good results with the Cplex barrier method. Below are timings in seconds with different Cplex algorithms

	rows/cols/nz	dual	primal	barrier
model 1	1e5/2e5/1.4e6	1247	857	8.7
model 2	1.1e5/2.1e5/6.1e5	195	1223	7.3

The GAMS models look like:

41  variable z,a1(j),a2(j),s1(i),s2(i),y(k);
42  positive variable s1,s2;
43  equation obj,e1(i),e2(k),e3(i);
44
45  option lp=cplex;
46  option iterlim=1000000;
47  option reslim=10000;
48
49  obj.. z =e= sum(i, s1(i)+s2(i));
50  e1(i).. sum(j, xx(i,j)*a1(j)) + s1(i) - s2(i) =e= b(i);
51  model m1 /obj,e1/;
52  solve m1 minimizing z using lp;
53
54  e2(k).. y(k) =e= sum(j, x(k,j)*a2(j));
55  e3(i).. sum(map(i,k), y(k)) + s1(i) - s2(i) =e= b(i);
56  model m2 /obj,e2,e3/;
57  *solve m2 minimizing z using lp;

Note that the SUM in line 55 is actually not a real summation. It just selects a k belonging to a particular i. A different, maybe better, way to write this would be:

equation e3(i,k);

e3(map(i,k)).. y(k) + s1(i) - s2(i) =e= b(i);

Tuesday, January 20, 2009

LAD regression

> I'm doing median regression with lp_solve. My LP-model is
>
> min sum(i, s1(i)+s2(i))
> s.t.
> sum(j, a(j)*x(i,j)) + s1(i) - s2(i) = b(i)
> s1(i),s2(i) >= 0

>
> where
> i=1..n observations,
> j=1..m describing dimensions,
> a(j) parameters to be identified,
> x(i,j) describing values for observation i,
> b(i) observed values,
> s1(i), s2(i) deviations of the described value from the observed value.
>
> For realistics instances I have > 100.000 observations. Many instances
> have rows i with equal x(i,j) but different b(i) (describing variables are
> equal but observations different). I wonder if there is a modeling trick
> to aggregate these rows into 1 or 2 constraints.

Don't know. But I have some suggestions to try out:

If j is large you may want to solve a larger but sparser problem where you prevent repeating the same sum(j, a(j)*x(i,j)) but using extra variables

y(k) = sum(j, a(j)*x(k,j))

and then

y(k) + s1(i) - s2(i) = b(i)

for suitable combinations (k,i).

You could reduce the number of variables again by a different trick:

s(i)>=0

-s(i) <= y(k)-b(i) <= s(i)

min sum(i, s(i))

Also you may want to try different LP solvers.

MSF Excel Plugin timings

It looks like the MS Solver Foundation Excel Plugin is somewhat slow for importing larger data sets (we observed this in this note). Here are a few timings that confirm this:

Cells	MSF Excel Plugin	GAMS gdxxrw
n=100	0.7	1.2
n=1000	2.7	1.3
n=10000	23.0	1.4

Timings are in seconds. We see the MSF timings increase quickly, while the GAMS timings are still dominated by the fixed cost of launching Excel.

MSF Model:
 
Binding: P<=="Sheet1!$B$2:$CW$101"
Model[
Parameters[Sets,I,J],
Parameters[Reals,P[I,J]],
Decisions[Reals,d],
Constraints[d==0]
]
 
GAMS Model:
 
$call gdxxrw i=data.xlsx par=p rng=A1:CW101
parameter p(*,*);
$gdxin data
$load p
 
scalar n;
n=card(p);
display n;

The MSF code contains a large parameter that is read through data binding, together with a minimal (one variable, one constraint) model. The GAMS code calls GDXXRW to read the spreadsheet and then the intermediate GDX file is read again by GAMS. We display the cardinality of p as a check we read all numbers.

Sunday, January 18, 2009

Can I use restart files in GAMS to hide data?

> Can I use restart files in GAMS to hide data?

No. If you produce a restart file by:

> gams model.gms save=t

then the data content of t.g00 can be revealed by running an empty model as follows:

> gams empty.gms restart=t gdx=t.gdx

Now t.gdx will contain all data in a convenient format for a user to look at.

There is an option in GAMS to hide data in restart files but that requires a special license option. For more info see newsletter7. (Note: the link to http://www.gams.com/docs/privacy.pdf does not work; I can email a copy).

Social Golfer Problem in OML

I am recently involved in a practical application of a scheduling problem related to the Social Golfer Problem. The particular application is somewhat more complicated, but we could use the GAMS/Cplex model described here as a starting point.

For smaller instances of the pure Social Golfer Problem, it is convenient to use a CSP approach. Here is a formulation in OML (Microsoft Solver Foundation):

Model[
 
// N : number of golfers
// NG : number of groups
// T : number of rounds
// GS : group size (N/NG)
 
 Parameters[Integers,N=16,NG=4,T=5,GS=4],
 // Would prefer: GS=N/NG but OML does not allow (constant) expressions
 // in parameter statements
 
 Decisions[
  Integers[0,1],
   Foreach[{i,N},{g,NG},{t,T},x[i,g,t]]
 ],
 
  
 Constraints[
     // each golfer has to play each round
     Foreach[{i,N},{t,T},Sum[{g,NG},x[i,g,t]] == 1],
 
     // form groups
     Foreach[{g,NG},{t,T},Sum[{i,N},x[i,g,t]] == GS],
 
     // golfer i and j meet at most one time
     // Foreach[{i,N}, FilteredForeach[{j,N},i!=j, Sum[{g,NG},{t,T},x[i,g,t]*x[j,g,t]] <= 1]],
     // We exploit symmetry here:
     Foreach[{i,N}, Foreach[{j,i+1,N}, Sum[{g,NG},{t,T},x[i,g,t]*x[j,g,t]] <= 1]],
 
 
     // fix first round
     Foreach[{g,NG},{k,GS},x[GS*g+k,g,0]==1],
 
     // fix first golfer
     Foreach[{t,1,T},x[0,0,t]==1]
 ]
 
]

The condition on number of times players can meet can be implemented quite straightforwardly. In the MIP model we needed to linearize this constraint, but in the CSP framework we can use the nonlinear formulation directly. A big advantage of CSP over MIP is that many constructs can be modeling in a more natural, straightforward fashion. Note that we added some exploitation of symmetry: once we have checked how many times golfer i meets golfer j we no longer need to check golfer j meeting golfer i.

The last two constraints fix the first round and first player. This (without loss of generality) reduces the size of the problem.

This instance solved easily:

Solver Foundation Report       
Report Overview       
Date Time    =    1/18/09 10:48 AM
Model Name    =    golf.xlsx
Solver Selected    =    CSP
Total Time (ms)    =    677
Solver Completion Status    =    Feasible
       
Solver Execution Details       
Algorithm    =    Tree Search
Variable Selection    =    Dynamic Weighting
Value Selection    =    Forward Order

Note that we changed the default solver algorithm settings to speed up the solution times.

The most interesting instance with 32 golfers, foursomes and T=10 is too large for my version to solve:

Parameters[Integers,N=32,NG=8,T=10,GS=4],
 
 
Error:
Model size limit has been exceeded for this version of the product. Please contact Microsoft Corporation for licensing options.

Tuesday, January 13, 2009

Modeling quadratic penalty on lateness

> I am working on a scheduling application and want to penalize

> large lateness values extra. So I want to use quadratic penalty
> for this. How can I do that with Cplex?

In general you don't want to do that. A better way is to add a cost term to the objective using the MAX lateness.

Often one can use a mix of the following cost (penalty) terms:

SUM (sum of how much jobs are late)
COUNT (number of late jobs)
MAX (maximum lateness)

Changing the penalty coefficients can allow you put different weights on these criteria.

E.g. the SUM criterium may lead to many jobs just a little bit late. Adding the COUNT criterium may help prevent this. Opposed to this: adding the MAX can help prevent jobs that are very late.

Friday, January 9, 2009

Sensitivity Analysis in a loop with GAMS/Cplex

> I have read the document http://www.gams.com/docs/sensitiv.htm but I don't
> see how I can do sensitivity analysis in a loop.

GAMS/Cplex writes sensitivity information to a text file (include file) and GAMS cannot read such a file at compile time. I.e. you cannot read a text file or include file to be specific inside a loop.

It would have been preferable in GAMS/Cplex would have been updated so that it can write sensitivity information to a GDX file (see also GAMS/Cplex wish list).

As a work around one can call a GAMS submodel from the GAMS main model. The submodel can then convert the include file to a GDX file, which then can be read by the main model.

A complete example is below:

$ontext
  
 shows how to do sensitivity analysis inside a single
 GAMS job by using GDX files.
  
  
 Erwin Kalvelagen, december 2003, revised jan 2009
  
$offtext
  
Sets
     i   canning plants   / seattle, san-diego /
     j   markets          / new-york, chicago, topeka / ;
  
Parameters
  
     a(i)  capacity of plant i in cases
       /    seattle     350
            san-diego   600  /
  
     b(j)  demand at market j in cases
       /    new-york    325
            chicago     300
            topeka      275  / ;
  
Table d(i,j)  distance in thousands of miles
                  new-york       chicago      topeka
    seattle          2.5           1.7          1.8
    san-diego        2.5           1.8          1.4  ;
  
Scalar f  freight in dollars per case per thousand miles  /90/ ;
  
Parameter c(i,j)  transport cost in thousands of dollars per case ;
  
          c(i,j) = f * d(i,j) / 1000 ;
  
Variables
     x(i,j)  shipment quantities in cases
     z       total transportation costs in thousands of dollars ;
  
Positive Variable x ;
  
Equations
     cost        define objective function
     supply(i)   observe supply limit at plant i
     demand(j)   satisfy demand at market j ;
  
cost ..        z  =e=  sum((i,j), c(i,j)*x(i,j)) ;
  
supply(i) ..   sum(j, x(i,j))  =l=  a(i) ;
  
demand(j) ..   sum(i, x(i,j))  =g=  b(j) ;
  
Model m /all/ ;
  
  
*
* set options to cplex writes an include file with sensitivity
* information
*
$onecho > cplex.opt
objrng all
rhsrng all
rngrestart rng.inc
$offecho
  
m.optfile=1;
m.dictfile=2;
  
option lp=cplex;
Set rnglim /lo,up/;
  
*
* three different scenarios
*
set iter /iter1*iter3/;
parameter delta(iter);
delta(iter)$(ord(iter)>1) = 100;
display delta;
  
  
*
* write gams file that reads include file rng.inc and
* writes a GDX file (specified on the command line)
*
$onecho > inc2gdx.gms
set i,j,rnglim
$gdxin sets.gdx
$load i j rnglim
$include rng.inc
$offecho
  
  
*
* save the sets i,j,rnlim to gdx file sets.gdx
*
execute_unload 'sets.gdx',i,j,rnglim;
  
  
*
* parameters to hold ranges for equation DEMAND
*
parameter demandrange(j,rnglim);
parameter alldemandrange(iter,j,rnglim);
  
  
loop(iter,
 a(i) = a(i) + delta(iter);
  
 SOLVE m USING LP minimizing z;
  
*
* store the sensitivity information in a GDX file
*
 execute '=gams.exe inc2gdx.gms lo=3 gdx=dump.gdx';
  
*
* get data from GDX file
*
 execute_load 'dump.gdx',demandrange=demandRNG;
  
 alldemandrange(iter,j,rnglim) = demandrange(j,rnglim);
);
  
display alldemandrange;