Sunday, January 31, 2010

Run GAMS from R

This is a simplistic way to run GAMS from R:

 

image

The GAMS model is adapted from here. It looks like:

$ontext

 
Linear version

$offtext


*
* data
*
set i /1*%n%/;
scalars
   k
/%k%/
   target
/%target%/
;

table pt(i,*)
$ondelim
$include gamsdata.csv
$offdelim
;

parameter p(i);
p(i) = pt(i,
"x");

*
* model
*
binary variable x(i);
variable z,s,d1,d2;
positive variable d1,d2;

equation e1,e2,e3,obj;

e1.. s =e=
sum(i, p(i)*x(i));
e2..
sum(i,x(i)) =e= k;
e3.. d1-d2 =e= s-target;
obj.. z =e= d1+d2;

model m /all/;
option optcr=0, mip=cplex;
solve m minimizing z using mip;

*
* do some reporting
*

option x:1;
display p,x.l;

scalar value,sqdeviation,num;
value =
sum(i, p(i)*x.l(i));
sqdeviation = sqr(value  - target);
num =
sum(i, x.l(i));
display target,value,sqdeviation,num;

*
* write solution
*
file sol /solution.csv/;
loop(i,
  
put sol,x.l(i)/
);


Here is how to use it:

image

Solving an optimization problem: selecting an "optimal" subset

This problem is described in http://comments.gmane.org/gmane.comp.lang.r.general/177553. An obvious MIQP formulation in GAMS could look like:

$ontext

Given vector of numbers x, I wish to select an n-subset with sum closest to
fixed value s. Can anyone advise me how to approach this, in R?

I have considered Rcplex package, which handles integer/binary
linear/quadratic optimization problems, but have difficulty setting up the
quadratic form for [sum(x) - s]^2.

(Dynamic programming over [0, sum(x)]? A genetic algorithm? Can anyone
contribute a binary GA optimization sample?)

--------------------------------------------------------------------------

I need to pick 2-40 elements out of a 50-200-element-long vector.


$offtext

set i /i1*i200/;
scalars
   k
/40/
   target
/3.1415926535/
;

parameter p(i);
p(i) = uniform(-100,100);

binary variable x(i);
variable z,s;

equation e1,e2,obj;

e1.. s =e=
sum(i, p(i)*x(i));
e2..
sum(i,x(i)) =e= k;
obj.. z =e= sqr(s-target);

model m /all/;
option optcr=0,miqcp=cplex;
solve m minimizing z using miqcp;

option x:1;
display p,x.l;

scalar value,sqdeviation,num;
value =
sum(i, p(i)*x.l(i));
sqdeviation = sqr(value  - target);
num =
sum(i, x.l(i));
display target,value,sqdeviation,num;


The results look like:

----     54 PARAMETER p 

i1   -65.651,    i2    68.653,    i3    10.075,    i4   -39.772,    i5   -41.558,    i6   -55.189,    i7   -30.034
i8    71.254,    i9   -86.577,    i10    0.042,    i11   99.624,    i12   15.747,    i13   98.227,    i14   52.450
i15  -73.862,    i16   27.944,    i17  -68.096,    i18  -49.984,    i19   33.786,    i20  -12.929,    i21  -28.060
i22  -29.712,    i23  -73.702,    i24  -69.980,    i25   17.823,    i26   66.179,    i27  -53.837,    i28   33.147
i29   55.172,    i30  -39.268,    i31  -77.902,    i32    0.477,    i33  -67.965,    i34   74.492,    i35  -46.977
i36  -42.837,    i37   18.791,    i38   44.544,    i39   25.650,    i40   -7.240,    i41  -17.339,    i42  -76.461
i43  -37.158,    i44  -90.690,    i45  -32.290,    i46  -63.580,    i47   29.145,    i48   12.149,    i49   53.992
i50  -40.439,    i51   32.221,    i52   51.164,    i53   25.489,    i54  -43.227,    i55  -82.715,    i56  -79.497
i57   28.250,    i58    9.062,    i59  -93.695,    i60   58.472,    i61  -85.447,    i62  -64.868,    i63    5.127
i64   50.042,    i65  -64.375,    i66  -93.172,    i67   17.026,    i68   24.246,    i69  -22.128,    i70  -28.257
i71  -51.393,    i72  -50.716,    i73  -73.899,    i74   86.690,    i75  -24.012,    i76   56.680,    i77  -39.993
i78  -74.903,    i79   49.775,    i80  -86.154,    i81  -59.597,    i82  -98.987,    i83  -46.077,    i84   -0.030
i85  -69.743,    i86  -65.166,    i87  -33.872,    i88  -36.619,    i89  -35.583,    i90   92.795,    i91   98.720
i92  -26.019,    i93  -25.422,    i94   54.396,    i95  -20.663,    i96   82.619,    i97  -76.084,    i98   47.096
i99  -88.916,    i100  15.260,    i101 -89.719,    i102 -98.798,    i103 -19.754,    i104   3.976,    i105  25.775
i106 -54.850,    i107 -20.776,    i108 -44.799,    i109 -69.525,    i110  87.265,    i111 -15.468,    i112 -73.067
i113 -22.789,    i114 -25.073,    i115 -46.304,    i116  89.674,    i117 -62.212,    i118 -40.498,    i119 -85.089
i120 -19.731,    i121 -79.662,    i122 -23.222,    i123 -35.181,    i124 -61.573,    i125 -77.526,    i126  19.312
i127   2.290,    i128 -90.987,    i129  56.620,    i130  89.150,    i131  19.293,    i132  21.468,    i133 -27.498
i134  18.814,    i135  35.971,    i136   1.318,    i137 -68.149,    i138  31.378,    i139   4.776,    i140 -75.121
i141  97.344,    i142 -54.375,    i143  35.131,    i144  55.355,    i145  86.490,    i146 -59.752,    i147 -40.573
i148 -60.554,    i149 -50.731,    i150  29.295,    i151  46.995,    i152 -82.913,    i153 -69.930,    i154 -13.162
i155 -62.612,    i156  38.539,    i157  52.595,    i158 -69.039,    i159 -22.124,    i160  39.086,    i161  69.162
i162  22.544,    i163  95.194,    i164 -94.622,    i165 -62.510,    i166 -82.576,    i167   8.080,    i168 -74.627
i169  46.800,    i170 -77.354,    i171  -2.329,    i172  59.120,    i173  -1.591,    i174   6.712,    i175 -97.875
i176   8.774,    i177  -9.774,    i178  95.066,    i179 -63.231,    i180 -67.294,    i181 -95.073,    i182 -64.435
i183 -87.736,    i184 -96.671,    i185  67.131,    i186  20.332,    i187 -94.597,    i188 -60.781,    i189  90.142
i190 -32.892,    i191  18.852,    i192 -48.162,    i193  28.127,    i194 -68.950,    i195  -7.997,    i196 -21.332
i197  61.092,    i198   8.198,    i199 -21.856,    i200  11.564

----     54 VARIABLE x.L 

i1           1.0,    i8           1.0,    i11          1.0,    i13          1.0,    i31          1.0
i34          1.0,    i41          1.0,    i44          1.0,    i59          1.0,    i66          1.0
i74          1.0,    i75          1.0,    i82          1.0,    i90          1.0,    i91          1.0
i96          1.0,    i101         1.0,    i102         1.0,    i107         1.0,    i110         1.0
i111         1.0,    i116         1.0,    i121         1.0,    i125         1.0,    i128         1.0
i130         1.0,    i141         1.0,    i144         1.0,    i157 4.625855E-6,    i161         1.0
i163         1.0,    i164         1.0,    i170         1.0,    i173         1.0,    i178         1.0
i184         1.0,    i187         1.0,    i192         1.0,    i193         1.0,    i196         1.0
i197         1.0

----     60 PARAMETER target               =        3.142 
            PARAMETER value                =        3.142 
            PARAMETER sqdeviation          =  2.55591E-28 
            PARAMETER num                  =       40.000 

The model solves very quickly: 2 seconds, but that is very much dependent on the data. The results show one binary variable with a small value. The obvious repair in a MIP model would be to set the integer feasibility tolerance (option epint) to zero. This causes the solution time to increase to 11 seconds, but unfortunately does not really solve the problem for this MIQP problem:

----     55 VARIABLE x.L 

i1           1.0,    i8           1.0,    i11          1.0,    i13          1.0,    i14  -7.57698E-7
i31          1.0,    i34          1.0,    i44          1.0,    i59          1.0,    i66          1.0
i74          1.0,    i82          1.0,    i90          1.0,    i91          1.0,    i95          1.0
i96          1.0,    i101         1.0,    i102         1.0,    i103         1.0,    i110         1.0
i116         1.0,    i121         1.0,    i122         1.0,    i125         1.0,    i128         1.0
i130         1.0,    i141         1.0,    i144         1.0,    i154         1.0,    i159         1.0
i161         1.0,    i163         1.0,    i164         1.0,    i170         1.0,    i173         1.0
i178         1.0,    i184         1.0,    i187         1.0,    i192         1.0,    i193         1.0
i197         1.0

We still have a tolerance issue. The log indicates there is some issue with scaling:

Warning: integer solution contains unscaled infeasibilities.
MIQP status(115): integer optimal with unscaled infeasibilities

I tried to disable scaling by scaind = -1, but that gave the same message. It looks like we have some numerical issues here (although the data looks quite benign).

This model can actually be solved as a MIP. The quadratic objective can be replaced by a linear one by minimizing the absolute deviation. The optimal solution should be the same for both objectives (apart from the issue of existence of multiple optimal solutions). The linear model can look like:

$ontext

 
Linear version

$offtext

*
* data
*
set i /i1*i200/;
scalars
   k
/40/
   target
/3.1415926535/
;

parameter p(i);
p(i) = uniform(-100,100);

*
* model
*
binary variable x(i);
variable z,s,d1,d2;
positive variable d1,d2;

equation e1,e2,e3,obj;

e1.. s =e=
sum(i, p(i)*x(i));
e2..
sum(i,x(i)) =e= k;
e3.. d1-d2 =e= s-target;
obj.. z =e= d1+d2;

model m /all/;
option optcr=0, mip=cplex;
solve m minimizing z using mip;

*
* do some reporting
*

option x:1;
display p,x.l;

scalar value,sqdeviation,num;
value =
sum(i, p(i)*x.l(i));
sqdeviation = sqr(value  - target);
num =
sum(i, x.l(i));
display target,value,sqdeviation,num;

The result are now:

----     42 VARIABLE x.L 

i11  1.0,    i19  1.0,    i63  1.0,    i92  1.0,    i101 1.0,    i102 1.0,    i103 1.0,    i104 1.0,    i105 1.0
i106 1.0,    i107 1.0,    i108 1.0,    i110 1.0,    i111 1.0,    i112 1.0,    i113 1.0,    i120 1.0,    i122 1.0
i123 1.0,    i126 1.0,    i127 1.0,    i129 1.0,    i130 1.0,    i131 1.0,    i132 1.0,    i133 1.0,    i134 1.0
i135 1.0,    i138 1.0,    i140 1.0,    i141 1.0,    i143 1.0,    i144 1.0,    i145 1.0,    i146 1.0,    i147 1.0
i148 1.0,    i149 1.0,    i150 1.0,    i167 1.0

----     48 PARAMETER target               =        3.142 
            PARAMETER value                =        3.142 
            PARAMETER sqdeviation          =  1.09841E-26 
            PARAMETER num                  =       40.000 

This solves faster and seems numerically a little bit more stable.

Monday, January 25, 2010

Unneeded loops in GAMS

From a post on the GAMS list:

yes if the data are to be aggregated one would have a one to many tuple and do sums, sma, smin or whatever is needed 

ie

tuple(period,month,year) /1.(jar.1970,feb.1970,....)

loop(period,

    newdata(period)=sum(tuple(period,month,year),data(month,year));

);

I see many, many models where the LOOP is used without much reason. This is an example. The more correct way of writing this is by removing the loop construct and using the GAMS parallel assignment which gives an implicit loop:

    newdata(period)=sum(tuple(period,month,year),data(month,year));

Excel reporting

Excel is an excellent tool for reporting results of optimization models. Some models produce a ton of data and it is not always easy to find a compact way to present all this information. Here is an (early draft) example of an investment selection model for water management projects. During first few days of experimenting with the model, this has been quite useful to get a quick sense of the solution.

image

Wednesday, January 20, 2010

Counting positive variables

Wondering if you knew any tricks off of the top of your head for this problem:

I have a model with a continuous decision variable x, indexed over a set: x_1, x_2, … x_n.  x is nonnegative. I want to add a constraint which says that the number of positive entries in x is less than K.  I can do it with Sum[…, AsInt[x[i] > 0]], but that is not a MIP constraint.  Is there a standard trick I can use in this case?

I actually got this question two times this week. Answer:

Binary variable y(i);

parameter M(i) = x.up(i);  (tight upper bound on x(i))

equations:

  sum(i, y(i)) ≤ K;

  x(i) ≤ y(i)*M(i);

Note that the we only have an implication one way: y(i)=0 => x(i)=0. In other words:

  • if x(i)>0 => y(i)=1
  • if x(i)=0 => y(i)=0 or 1

In the solution, the y(i)’s have no straightforward interpretation. We cannot just consider the y(i)’s as an indicator for (strictly)positive x(i)’s. If the inequality sum(i, y(i)) ≤ K is not tight, there may be y(i)’s equal to one while the corresponding x(i)=0.

MIP solver performance (GLPK,SCIP,GUROBI)

I am modelling a project in which doing some tasks before others will save time. I think that the model is correct - but even though I have only entered the savings for 1 task into the objective function, it is taking too long to run. I believe that the problem lies in my "before1" and "before2" conditions, each of which contain 15^4=50625 modelling variables. These conditions are needed to enable the before[] variable, where before[a,b]=1 means that task a will be completed before task b.

The variable position[] is working correctly (position[5]=3 means that task five will be done third). I could eliminate the huge amount of work being done in the "before1" and "before2" conditions if I could make use of these position variables. Could anyone advise me how I can achieve the following, please?

* if position[a] > position[b], then before[a,b] = 0
* if position[a] < position[b], then before[a,b] = 1

Thanks for any suggestions!

Model appended below:

/*
The project consists of 15 tasks. Completion of any of these tasks will save time on other tasks in the project as follows:
Task 1 savings: 2 hours on task 2: 2 hours on task 4: 1 hours on task 9
Task 2 savings: 2 hours on task 1: 3 hours on task 15
Task 3 savings: 4 hours on task 1: 1 hours on task 5
Task 4 savings: 1 hours on task 1: 1 hours on task 3: 2 hours on task 5: 1 hours on task 10
Task 5 savings: 1 hours on task 2: 1 hours on task 10: 1 hours on task 13: 1 hours on task 14: 1 hours on task 15
Task 6 savings: 1 hours on task 4: 1 hours on task 8: 2 hours on task 11: 1 hours on task 15
Task 7 savings: 2 hours on task 2: 3 hours on task 15
Task 8 savings: 4 hours on task 5: 1 hours on task 11
Task 9 savings: 1 hours on task 2: 2 hours on task 3: 1 hours on task 11: 1 hours on task 13
Task 10 savings: 4 hours on task 11: 1 hours on task 15
Task 11 savings: 1 hours on task 1: 1 hours on task 3: 1 hours on task 8: 1 hours on task 10: 1 hours on task 12
Task 12 savings: 3 hours on task 1: 1 hours on task 4: 1 hours on task 11
Task 13 savings: 2 hours on task 3: 3 hours on task 12
Task 14 savings: 1 hours on task 3: 1 hours on task 6: 1 hours on task 7: 1 hours on task 8: 1 hours on task 12
Task 15 savings: 2 hours on task 6: 2 hours on task 12: 1 hours on task 13
*/


var task{1..15, 1..15}, binary;
s.t. task1{a in 1..15}: sum{b in 1..15} task[a, b] = 1;
s.t. task2{b in 1..15}: sum{a in 1..15} task[a, b] = 1;

var position{1..15}, >= 0;
s.t. position1{a in 1..15}: sum{b in 1..15} b * task[b,a] = position[a];

var before{1..15, 1..15}, binary;
s.t. before1{a in 1..15, b in 1..15, c in 1..15, d in 1..15: b > a and c != d}: before[c,d] + 1.5 >= task[a,c] + task[b,d];
s.t. before2{a in 1..15, b in 1..15, c in 1..15, d in 1..15: a > b and c != d}: before[c,d] + task[a,c] + task[b,d] <= 2.5;

var saving{1..15}, >= 0;
s.t. s1: saving[1] = 2 * before[1,2] + 2 * before[1,4] + before[1,9];
maximize savings: saving[1];

solve;

for {a in 1..15} {
  for {b in 1..15: task[a,b] = 1} {
    printf "%s. Task %s\n", a, b;
  }
}
for {a in 1..15, b in 1..15: before[a,b] >= 0.5 and a != b}{
  printf "task %s is done before task %s\n", a, b;
}
for {a in 1..15} {
  printf "Position of task %s: %s\n", a, position[a];
}
printf "\nTotal Savings: %s hours.", saving[1];
end;

>> I solved your enclosed model in 350sec. using SCIP (Answer was "5" ) with no special settings.

Using glpk2gams and GAMS/Gurobi this was solved in 7.5 seconds:

MODEL STATISTICS

BLOCKS OF EQUATIONS      44,147     SINGLE EQUATIONS       44,147
BLOCKS OF VARIABLES         452     SINGLE VARIABLES          452
NON ZERO ELEMENTS       132,996     DISCRETE VARIABLES        435



               S O L V E      S U M M A R Y

     MODEL   m                   OBJECTIVE  gamsobjvar
     TYPE    MIP                 DIRECTION  MAXIMIZE
     SOLVER  GUROBI              FROM LINE  132906

**** SOLVER STATUS     1 Normal Completion        
**** MODEL STATUS      1 Optimal                  
**** OBJECTIVE VALUE                5.0000

RESOURCE USAGE, LIMIT          7.548      1000.000
ITERATION COUNT, LIMIT      9497    2000000000

There are better formulations: this test was done on the original model (unchanged, except for the conversion to GAMS).

Friday, January 15, 2010

Difficult MIQP

I would like to minimize || PG-I || over P, where P is a p x p permutation matrix
(obtained by permuting the rows and/or columns of the identity matrix), G is a
given p x p matrix with full rank and I the identity matrix.
||.|| is the frobenius norm.

This can be implemented as a MIQP:

set i /i1*i25/;
alias (i,j,k);

parameter G(i,j);
G(i,j) = uniform(-1,1);

parameter Id(i,j);
Id(i,i) = 1;

binary variable p(i,j);
variable z,v(i,j);

equation
  defv(i,j)
  norm
  assign1(i)
  assign2(j)
;

* Frobenius norm: http://mathworld.wolfram.com/FrobeniusNorm.html
* we can drop the sqrt as this is a monotonic increasing function
norm..  z =e=
sum((i,j), sqr(v(i,j)));

* we can substitute out v if needed
defv(i,j).. v(i,j) =e=
sum(k, p(i,k)*g(k,j)) - Id(i,j);

* these assignment constraints make sure P is a permutation matrix
assign1(i)..
sum(j, p(i,j)) =e= 1;
assign2(j)..
sum(i, p(i,j)) =e= 1;


model m /all/;
solve m minimizing z using miqcp;

This is not very easy to solve to optimality if set i is larger than 20. The bounds just move too slowly. When running the problem for 1000 seconds with Cplex I see:

image

Although new solutions are found during the search even close to the end of the run, they only show marginal improvements.

Someone suggested to solve this as an assignment problem. I don’t believe that is actually 100% correct, but the approximation turns out to be very good. The GAMS code is:

parameter c(i,j);
c(i,j) = sqrt(
sum(k, sqr(Id(i,k)-G(j,k))));

equation cassign;
cassign.. z =e=
sum((i,j),c(i,j)*p(i,j));

model assign/assign1,assign2,cassign/;
solve assign minimizing z using rmip;

Here is small example that shows some differences:

set i /i1*i6/;
alias (i,j,k);

parameter G(i,j);
*G(i,j) = uniform(0,2);
G(i,j) = normal(0,2);

parameter Id(i,j);
Id(i,i) = 1;

binary variable p(i,j);
variable z,v(i,j);

equation
  defv(i,j)
  norm
  assign1(i)
  assign2(j)
;

* Frobenius norm: http://mathworld.wolfram.com/FrobeniusNorm.html
* we can drop the sqrt as this is a monotonic increasing function
norm..  z =e=
sum((i,j), sqr(v(i,j)));

* we can substitute out v if needed
defv(i,j).. v(i,j) =e=
sum(k, p(i,k)*g(k,j)) - Id(i,j);

* these assignment constraints make sure P is a permutation matrix
assign1(i)..
sum(j, p(i,j)) =e= 1;
assign2(j)..
sum(i, p(i,j)) =e= 1;


model m /all/;
option optcr=0;
solve m minimizing z using miqcp;

scalar frobnorm1; frobnorm1 = sqrt(z.l);
option p:0;
display p.l,frobnorm1;

parameter c(i,j);
c(i,j) = sqrt(
sum(k, sqr(Id(i,k)-G(j,k))));

equation cassign;
cassign.. z =e=
sum((i,j),c(i,j)*p(i,j));

model assign/assign1,assign2,cassign/;
solve assign minimizing z using rmip;

scalar frobnorm2,diff;
frobnorm2=sqrt(
sum((i,j),sqr(sum(k, p.l(i,k)*g(k,j)) - Id(i,j))));
diff = frobnorm2-frobnorm1;
display p.l,frobnorm2,frobnorm1,diff;

The differences are:

----     81 PARAMETER frobnorm2            =       10.584 
            PARAMETER frobnorm1            =       10.502 
            PARAMETER diff                 =        0.082 

A different suggestion seems to actually work. When we use

c(i,j) = sum(k, sqr(Id(i,k)-G(j,k)));

the results with some random G matrices are the same. So to understand this we need to show that

imageand

image

give the same solutions.

Sunday, January 10, 2010

GAMS: ifthen

An exogenous condition in an ifthen function can be removed during model generation. In this case only one of the branches need to be passed on to the solver. However this does not always happen:

set i /i1,i2/;
parameter a(i) /
  
i1 1
  
i2 NA
/;

variable y,x(i);
equation e1,e(i);

e1.. y =e= 10;
* this one works ok:
e(i).. x(i) =e= ifthen(a(i)<>
na,a(i),-1);
* this one does not:
*e(i).. x(i) =e= ifthen(a(i)<>na,a(i),sqr(y));

model m/all/;
solve m using cns;

In the first case GAMS will generate a block of simple equations just as we would expect:

---- e  =E= 

e(i1)..  x(i1) =E= 1 ; (LHS = 0, INFES = 1 ****)
e(i2)..  x(i2) =E= -1 ; (LHS = 0, INFES = 1 ****)

However in the second case the exogenous condition is actually passed on to the solver. This leads to:

---- e  =E= 

e(i1)..  (0)*y + x(i1) =E= 0 ; (LHS = -1, INFES = 1 ****)
e(i2)..  (0)*y + x(i2) =E= 0 ; (LHS = 0)

For i1 we would expect the same result as before: x(i1) =e= 1; In this case it really generates a nonlinear if-then-else construct. The condition uses NA, and that results in an error:

**** Exec Error at line 14: A constant in a nonlinear expression in equation e evaluated to NA

This caused some problems for me in some generated code where I translate Excel formulas to GAMS equations. A possible work-around is to check for myself if the condition is exogenous (i.e. does not depend on the level of a variable) and then to generate:

e(i).. x(i) =e= a(i)$(a(i)<>na) + sqr(y)$(a(i)=na);