In [1] a question was posed: how to
maximize the standard deviation of a vector \(\color{darkred}x_i\), subject to side constraints using an LP/MIP formulation?
The standard deviation is usually defined as: \[\mathbf{sd}(x) = \sqrt{\frac{1}{n-1}\sum_i \left(x_i-\mu\right)^2}\] where \[\mu = \frac{\sum_i x_i}{n}\] and \(n\) is the number of elements in \(x\).
The first thing to note is that we can simplify this a bit. The solution will not change, if we replace \[\min/\max\> \mathbf{sd}(x)\] by \[\min/\max \>\sum_i \left(x_i-\mu\right)^2\]
To understand a bit what a solution with a high standard deviation looks like, let's have a look at the following model: \[\begin{align}\max & \sum_i \left(\color{darkred}x_i-\color{darkred}\mu\right)^2 \\ & \color{darkred}\mu = \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n} \\ & \color{darkred}x_i \ge \color{darkred}x_{i-1}\\ & \color{darkred}x_i \in [1,4]\end{align}\] The solution is:
---- 27 VARIABLE x.L
i1 1.000, i2 1.000, i3 1.000, i4 1.000, i5 4.000, i6 4.000, i7 4.000, i8 4.000
---- 27 VARIABLE z.L = 18.000 objective
PARAMETER stdev = 1.604 standard deviation
We see two clusters: one at 1.0 and one at 4.0. Obviously, if your goal is to evenly distribute values, maximizing the standard deviation is not the way to go.
Note that this is a non-convex quadratic programming problem. You need a global solver for this problem. Possible solvers are Cplex, Gurobi, and general-purpose MINLP solvers like Baron and Antigone.
Back to the original model. We assume that we want to maximize the standard deviation, subject to a linear side-constraint and to overall bounds \([\color{darkblue}\ell,\color{darkblue}u]\). Here are some possible formulations and approximations.
Model 1: Nonconvex QP
The first formulation for the problem stated in [1] is:
Nonconvex QP Model |
\[\begin{align}\max & \sum_i (\color{darkred}x_i-\color{darkred}\mu)^2 \\ & \color{darkred}\mu = \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n} \\ & \sum_i \color{darkblue}a_i\cdot\color{darkred}x_i = \color{darkblue}b \\ & \color{darkred}x_i \in [\color{darkblue}\ell,\color{darkblue}u] \end{align}\] |
Note that we have a single linear side constraint in this model.
Model 2: SOS1 variables
Here we try to replace the quadratic terms with absolute values: \[\max\>\sum_i |\color{darkred}x_i-\color{darkred}\mu|\] This will allow us to use a linear MIP solver. We use a simple variable splitting approach: \[s_i^+ - s_i^- = \color{darkred}x_i-\color{darkred}\mu\] where \(s_i^+, s_i^- \ge 0\). As we are maximizing the absolute values we need to make sure not both \(s_i^+, s_i^-\) are positive, leading to an unbounded problem. One way to enforce this is to form \(n\) SOS1 sets with two members \(s_i^+, s_i^-\). In a SOS1 set, only one member can be nonzero. So we have:
SOS1 Approximation Model |
\[\begin{align}\max & \sum_i \left(\color{darkred}s_i^+ + \color{darkred}s_i^- \right) \\ & \color{darkred}s_i^+ - \color{darkred}s_i^- = \color{darkred}x_i-\color{darkred}\mu \\ & \color{darkred}\mu = \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n} \\ & \sum_i \color{darkblue}a_i\cdot\color{darkred}x_i = \color{darkblue}b \\ & \color{darkred}x_i \in [\color{darkblue}\ell,\color{darkblue}u] \\ & \color{darkred}s_i^+, \color{darkred}s_i^- \ge 0 \\ & \{\color{darkred}s_i^+, \color{darkred}s_i^-\} \in {SOS1} \end{align}\] |
As we will see later, there are problems with this formulation.
Model 3: Binary variables
Instead of using SOS1 sets to choose a single branch, we can also use binary variables. This is quite similar to the previous model.
Binary Approximation Model |
\[\begin{align}\max & \sum_i \left(\color{darkred}s_i^+ + \color{darkred}s_i^- \right) \\ & \color{darkred}s_i^+ - \color{darkred}s_i^- = \color{darkred}x_i-\color{darkred}\mu \\ & \color{darkred}s_i^+ \le \color{darkred}\delta_i \cdot (\color{darkblue}u-\color{darkblue}\ell)\\ & \color{darkred}s_i^- \le (1-\color{darkred}\delta_i) \cdot (\color{darkblue}u-\color{darkblue}\ell) \\ & \color{darkred}\mu = \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n} \\ & \sum_i \color{darkblue}a_i\cdot\color{darkred}x_i = \color{darkblue}b \\ & \color{darkred}x_i \in [\color{darkblue}\ell,\color{darkblue}u] \\ & \color{darkred}s_i^+, \color{darkred}s_i^- \ge 0 \\ & \color{darkred}\delta_i \in \{0,1\} \end{align}\] |
As all the variables \(\color{darkred}x_i\) live inside \([\color{darkblue}\ell,\color{darkblue}u]\), the mean will also be inside this interval. This means \(\color{darkblue}u-\color{darkblue}\ell\) is a valid upper bound on the variable \(\color{darkred}s_i^+, \color{darkred}s_i^-\).
Results
---- 134 PARAMETER results
GlobalQP SOS1 SOS1/bnd Binary
i1 3.000 3.000 3.000 3.000
i2 0.500 0.500 0.500 0.500
i3 0.500 0.500 0.500 0.500
i4 3.000 3.000 3.000 3.000
i5 3.000 3.000 3.000 3.000
i6 3.000 3.000 3.000 3.000
i7 3.000 3.000 3.000 3.000
i8 0.500 0.500 0.500 0.500
i9 3.000 3.000 3.000 3.000
i10 0.500 0.500 0.500 0.500
i11 0.500 0.500 0.500 0.500
i12 0.500 0.500 0.500 0.500
i13 0.500 0.500 0.500 0.500
i14 0.500 0.500 0.500 0.500
i15 3.000 3.000 3.000 3.000
i16 0.500 0.500 0.500 0.500
i17 3.000 3.000 3.000 3.000
i18 3.000 3.000 3.000 3.000
i19 0.500 0.500 0.500 0.500
i20 0.500 0.500 0.500 0.500
i21 2.987 2.987 2.987 2.987
i22 3.000 3.000 3.000 3.000
i23 3.000 3.000 3.000 3.000
i24 3.000 3.000 3.000 3.000
i25 0.500 0.500 0.500 0.500
i26 0.500 0.500 0.500 0.500
i27 3.000 3.000 3.000 3.000
i28 0.500 0.500 0.500 0.500
i29 0.500 0.500 0.500 0.500
i30 0.500 0.500 0.500 0.500
status Optimal Optimal Optimal Optimal
obj 46.633 37.320 37.320 37.320
mu 1.666 1.666 1.666 1.666
stdev 1.268 1.268 1.268 1.268
time 0.125 12399.812 0.750 0.719
nodes 613.000 -2.14747E+9 14163.000 14307.000
iterations 10691.000 -1.28591E+8 69873.000 84000.000
We see some interesting things.
The SOS1 model is just taking forever. Largely because many of the nodes are
unbounded. A small fragment of the log looks like:
Elapsed time = 35.09 sec. (12722.88 ticks, tree = 1.78 MB)
5380775 6966 cutoff 36.6404 10482581 0.00%
5763069 7166 unbounded 36.6404 11394958
6151718 7286 unbounded 36.6404 12102053
6533752 7353 cutoff 36.6404 12491050 0.00%
6917221 7456 cutoff 36.6404 12955237 0.00%
7294233 7603 cutoff 36.6404 14146645 0.00%
7679187 7705 cutoff 36.6404 14930905 0.00%
8062790 7714 unbounded 36.6404 15158021
8443859 7813 unbounded 36.6404 15935203
8829248 7941 cutoff 36.6404 16958187 0.00%
Note that the reported gaps of \(0\%\) are wrong. There is no valid upper bound, so there is no meaningful gap to report. A zero gap would mean: we are almost done. I can assure you this is not the case. I would classify this as a Clex bug. B.t.w., Gurobi is not doing any better on this model.
The reported iteration and node counts for the SOS1 column are negative. This is typically the symptom of an integer overflow somewhere. That is likely a GAMS bug.
We can remedy this disastrous performance by adding a bound on the objective. I used: \[\color{darkred}z \le \color{darkblue}n\cdot(\color{darkblue}u-\color{darkblue}\ell)\] This model corresponds to the column SOS1/bnd. This obviously gives a huge performance improvement. An alternative would be to impose bounds: \[\color{darkred}s_i^+, \color{darkred}s_i^- \le \color{darkblue}u-\color{darkblue}\ell\]
The linear approximations find the same optimal solution as the QP model. This is not guaranteed of course.
Conclusions
This is a small non-convex QP model that can be approximated by replacing a 2-norm penalty with a 1-norm. This exercise reveals a big problem in the SOS1 formulation: it just performs extremely poorly unless we add an upper bound to the objective. Along the way, we observed some bugs.
It looks like the most intuitive approach (a QP model) is also the best performer.
References
Appendix: GAMS model
*------------------------------------------------------------------
* data
*------------------------------------------------------------------
set i /i1*i30/;
scalars
lo / 0.5 /
up / 3 /
range
;
range = up-lo;
parameter a(i) 'coefficients of constraint';
a(i) = uniform(0,2);
*------------------------------------------------------------------
* reporting macros
*------------------------------------------------------------------
parameter
results(*,*);
acronym Optimal;
* macros for reporting
$macro stdev
sqrt(sum(i,sqr(x.l(i)-mu.l))/(card(i)-1))
$macro report(m,label) \
results(i,label) = x.l(i); \
results('status',label)= m.solvestat; \
results('status',label)$(m.solvestat=1) = Optimal;
\
results('obj',label) = z.l; \
results('mu',label) = mu.l; \
results('stdev',label) = stdev; \
results('time',label) = m.resusd; \
results('nodes',label) = m.nodusd; \
results('iterations',label) = m.iterusd; \
display results;
*------------------------------------------------------------------
* non-convex QP model
*------------------------------------------------------------------
positive variable x(i);
x.lo(i) = lo;
x.up(i) = up;
variable
z 'obj'
mu 'mean'
;
Equations
e
'side constraint'
obj
'objective'
mean
'average'
;
e.. sum(i,a(i)*x(i)) =e= card(i);
mean.. mu =e= sum(i,x(i))/card(i);
obj.. z =e= sum(i, sqr(x(i)-mu));
model m1 /all/;
option
qcp=cplex,threads=16;
m1.optfile=1;
solve m1
maximizing z using qcp;
$onecho > cplex.opt
OptimalityTarget=3
$offecho
report(m1,'GlobalQP')
*------------------------------------------------------------------
* SOS1 based model
*------------------------------------------------------------------
set pm /'+','-'/;
sos1 variable s(i,pm);
equations
split
obj2
;
split(i).. s(i,'+')-s(i,'-') =e= x(i)-mu;
obj2.. z =e= sum((i,pm),s(i,pm));
model m2 /split,obj2,e,mean/;
solve m2
maximizing z using mip;
report(m2,'SOS1')
*------------------------------------------------------------------
* SOS1 with bound on the objective variable
*------------------------------------------------------------------
z.up = range*card(i);
solve m2
maximizing z using mip;
report(m2,'SOS1/bnd')
z.up = INF;
*------------------------------------------------------------------
* model using binary variables
*------------------------------------------------------------------
positive variable s2(i,pm);
binary variable b(i);
Equations
split2a
split2b
split2c
obj3
;
split2a(i).. s2(i,'+')-s2(i,'-') =e= x(i)-mu;
split2b(i).. s2(i,'+') =l=
b(i)*range;
split2c(i).. s2(i,'-') =l=
(1-b(i))*range;
obj3.. z =e= sum((i,pm),s2(i,pm));
model m3 /split2a,split2b,split2c,obj3,e,mean/;
solve m3
maximizing z using mip;
report(m3,'Binary')
|
Very interesting
ReplyDeleteI love seeing how problems can be formulated into MILP