Yet Another Math Programming Consultant: Solving an optimization problem: selecting an "optimal" subset

This problem is described in http://comments.gmane.org/gmane.comp.lang.r.general/177553. An obvious MIQP formulation in GAMS could look like:

$ontext

Given vector of numbers x, I wish to select an n-subset with sum closest to
fixed value s. Can anyone advise me how to approach this, in R?

I have considered Rcplex package, which handles integer/binary
linear/quadratic optimization problems, but have difficulty setting up the
quadratic form for [sum(x) - s]^2.

(Dynamic programming over [0, sum(x)]? A genetic algorithm? Can anyone
contribute a binary GA optimization sample?)

--------------------------------------------------------------------------

I need to pick 2-40 elements out of a 50-200-element-long vector.

$offtext

set i /i1*i200/;
scalars
k /40/
target /3.1415926535/
;

parameter p(i);
p(i) = uniform(-100,100);

binary variable x(i);
variable z,s;

equation e1,e2,obj;

e1.. s =e= sum(i, p(i)*x(i));
e2.. sum(i,x(i)) =e= k;
obj.. z =e= sqr(s-target);

model m /all/;
option optcr=0,miqcp=cplex;
solve m minimizing z using miqcp;

option x:1;
display p,x.l;

scalar value,sqdeviation,num;
value = sum(i, p(i)*x.l(i));
sqdeviation = sqr(value - target);
num = sum(i, x.l(i));
display target,value,sqdeviation,num;

The results look like:

---- 54 PARAMETER p

i1   -65.651,    i2    68.653,    i3    10.075,    i4   -39.772,    i5   -41.558,    i6   -55.189,    i7   -30.034
i8    71.254,    i9   -86.577,    i10    0.042,    i11   99.624,    i12   15.747,    i13   98.227,    i14   52.450
i15 -73.862,    i16   27.944,    i17 -68.096,    i18 -49.984,    i19   33.786,    i20 -12.929,    i21 -28.060
i22 -29.712,    i23 -73.702,    i24 -69.980,    i25   17.823,    i26   66.179,    i27 -53.837,    i28   33.147
i29   55.172,    i30 -39.268,    i31 -77.902,    i32    0.477,    i33 -67.965,    i34   74.492,    i35 -46.977
i36 -42.837,    i37   18.791,    i38   44.544,    i39   25.650,    i40   -7.240,    i41 -17.339,    i42 -76.461
i43 -37.158,    i44 -90.690,    i45 -32.290,    i46 -63.580,    i47   29.145,    i48   12.149,    i49   53.992
i50 -40.439,    i51   32.221,    i52   51.164,    i53   25.489,    i54 -43.227,    i55 -82.715,    i56 -79.497
i57   28.250,    i58    9.062,    i59 -93.695,    i60   58.472,    i61 -85.447,    i62 -64.868,    i63    5.127
i64   50.042,    i65 -64.375,    i66 -93.172,    i67   17.026,    i68   24.246,    i69 -22.128,    i70 -28.257
i71 -51.393,    i72 -50.716,    i73 -73.899,    i74   86.690,    i75 -24.012,    i76   56.680,    i77 -39.993
i78 -74.903,    i79   49.775,    i80 -86.154,    i81 -59.597,    i82 -98.987,    i83 -46.077,    i84   -0.030
i85 -69.743,    i86 -65.166,    i87 -33.872,    i88 -36.619,    i89 -35.583,    i90   92.795,    i91   98.720
i92 -26.019,    i93 -25.422,    i94   54.396,    i95 -20.663,    i96   82.619,    i97 -76.084,    i98   47.096
i99 -88.916,    i100 15.260,    i101 -89.719,    i102 -98.798,    i103 -19.754,    i104   3.976,    i105 25.775
i106 -54.850,    i107 -20.776,    i108 -44.799,    i109 -69.525,    i110 87.265,    i111 -15.468,    i112 -73.067
i113 -22.789,    i114 -25.073,    i115 -46.304,    i116 89.674,    i117 -62.212,    i118 -40.498,    i119 -85.089
i120 -19.731,    i121 -79.662,    i122 -23.222,    i123 -35.181,    i124 -61.573,    i125 -77.526,    i126 19.312
i127   2.290,    i128 -90.987,    i129 56.620,    i130 89.150,    i131 19.293,    i132 21.468,    i133 -27.498
i134 18.814,    i135 35.971,    i136   1.318,    i137 -68.149,    i138 31.378,    i139   4.776,    i140 -75.121
i141 97.344,    i142 -54.375,    i143 35.131,    i144 55.355,    i145 86.490,    i146 -59.752,    i147 -40.573
i148 -60.554,    i149 -50.731,    i150 29.295,    i151 46.995,    i152 -82.913,    i153 -69.930,    i154 -13.162
i155 -62.612,    i156 38.539,    i157 52.595,    i158 -69.039,    i159 -22.124,    i160 39.086,    i161 69.162
i162 22.544,    i163 95.194,    i164 -94.622,    i165 -62.510,    i166 -82.576,    i167   8.080,    i168 -74.627
i169 46.800,    i170 -77.354,    i171 -2.329,    i172 59.120,    i173 -1.591,    i174   6.712,    i175 -97.875
i176   8.774,    i177 -9.774,    i178 95.066,    i179 -63.231,    i180 -67.294,    i181 -95.073,    i182 -64.435
i183 -87.736,    i184 -96.671,    i185 67.131,    i186 20.332,    i187 -94.597,    i188 -60.781,    i189 90.142
i190 -32.892,    i191 18.852,    i192 -48.162,    i193 28.127,    i194 -68.950,    i195 -7.997,    i196 -21.332
i197 61.092,    i198   8.198,    i199 -21.856,    i200 11.564

---- 54 VARIABLE x.L

i1           1.0,    i8           1.0,    i11          1.0,    i13          1.0,    i31          1.0
i34          1.0,    i41          1.0,    i44          1.0,    i59          1.0,    i66          1.0
i74          1.0,    i75          1.0,    i82          1.0,    i90          1.0,    i91          1.0
i96          1.0,    i101         1.0,    i102         1.0,    i107         1.0,    i110         1.0
i111         1.0,    i116         1.0,    i121         1.0,    i125         1.0,    i128         1.0
i130         1.0,    i141         1.0,    i144         1.0,    i157 4.625855E-6,    i161         1.0
i163         1.0,    i164         1.0,    i170         1.0,    i173         1.0,    i178         1.0
i184         1.0,    i187         1.0,    i192         1.0,    i193         1.0,    i196         1.0
i197         1.0

----     60 PARAMETER target               =        3.142
            PARAMETER value                =        3.142
            PARAMETER sqdeviation          = 2.55591E-28
            PARAMETER num                  =       40.000

The model solves very quickly: 2 seconds, but that is very much dependent on the data. The results show one binary variable with a small value. The obvious repair in a MIP model would be to set the integer feasibility tolerance (option epint) to zero. This causes the solution time to increase to 11 seconds, but unfortunately does not really solve the problem for this MIQP problem:

---- 55 VARIABLE x.L

i1           1.0,    i8           1.0,    i11          1.0,    i13          1.0,    i14 -7.57698E-7
i31          1.0,    i34          1.0,    i44          1.0,    i59          1.0,    i66          1.0
i74          1.0,    i82          1.0,    i90          1.0,    i91          1.0,    i95          1.0
i96          1.0,    i101         1.0,    i102         1.0,    i103         1.0,    i110         1.0
i116         1.0,    i121         1.0,    i122         1.0,    i125         1.0,    i128         1.0
i130         1.0,    i141         1.0,    i144         1.0,    i154         1.0,    i159         1.0
i161         1.0,    i163         1.0,    i164         1.0,    i170         1.0,    i173         1.0
i178         1.0,    i184         1.0,    i187         1.0,    i192         1.0,    i193         1.0
i197         1.0

We still have a tolerance issue. The log indicates there is some issue with scaling:

Warning: integer solution contains unscaled infeasibilities.
MIQP status(115): integer optimal with unscaled infeasibilities

I tried to disable scaling by scaind = -1, but that gave the same message. It looks like we have some numerical issues here (although the data looks quite benign).

This model can actually be solved as a MIP. The quadratic objective can be replaced by a linear one by minimizing the absolute deviation. The optimal solution should be the same for both objectives (apart from the issue of existence of multiple optimal solutions). The linear model can look like:

$ontext

Linear version

$offtext

*
* data
*
set i /i1*i200/;
scalars
k /40/
target /3.1415926535/
;

parameter p(i);
p(i) = uniform(-100,100);

*
* model
*
binary variable x(i);
variable z,s,d1,d2;
positive variable d1,d2;

equation e1,e2,e3,obj;

e1.. s =e= sum(i, p(i)*x(i));
e2.. sum(i,x(i)) =e= k;
e3.. d1-d2 =e= s-target;
obj.. z =e= d1+d2;

model m /all/;
option optcr=0, mip=cplex;
solve m minimizing z using mip;

*
* do some reporting
*

option x:1;
display p,x.l;

scalar value,sqdeviation,num;
value = sum(i, p(i)*x.l(i));
sqdeviation = sqr(value - target);
num = sum(i, x.l(i));
display target,value,sqdeviation,num;

The result are now:

---- 42 VARIABLE x.L

i11 1.0,    i19 1.0,    i63 1.0,    i92 1.0,    i101 1.0,    i102 1.0,    i103 1.0,    i104 1.0,    i105 1.0
i106 1.0,    i107 1.0,    i108 1.0,    i110 1.0,    i111 1.0,    i112 1.0,    i113 1.0,    i120 1.0,    i122 1.0
i123 1.0,    i126 1.0,    i127 1.0,    i129 1.0,    i130 1.0,    i131 1.0,    i132 1.0,    i133 1.0,    i134 1.0
i135 1.0,    i138 1.0,    i140 1.0,    i141 1.0,    i143 1.0,    i144 1.0,    i145 1.0,    i146 1.0,    i147 1.0
i148 1.0,    i149 1.0,    i150 1.0,    i167 1.0

----     48 PARAMETER target               =        3.142
            PARAMETER value                =        3.142
            PARAMETER sqdeviation          = 1.09841E-26
            PARAMETER num                  =       40.000

This solves faster and seems numerically a little bit more stable.

Yet Another Math Programming Consultant

Sunday, January 31, 2010

Solving an optimization problem: selecting an "optimal" subset

No comments:

Post a Comment