Tuesday, November 8, 2022

sum of K largest

Consider a vector of variables \(\color{darkred}x_i\). We want to impose the constraint: \[\text{sum of $K$ largest $\color{darkred}x_i$}  \le 0.5\sum_i \color{darkred}x_i\] I.e. the sum of the largest \(K\) values are less than half of the total. Sometimes this is expressed mathematically as \[\sum_{i=1}^K \color{darkred}x_{[i]} \le 0.5\sum_i \color{darkred}x_i\] where \(\color{darkred}x_{[i]}\) is an ordered version of \(\color{darkred}x_i\) with \(\color{darkred}x_{[1]}\ge\color{darkred}x_{[2]} \ge \color{darkred}x_{[3]} \dots \). There are several formulations for this constraint, all of them interesting, I think.

The first observation is that, if there are additional constraints, \(\color{darkred}x_i=0\) is feasible. If we exclude this solution, e.g. by a constraint like \[\sum_i \color{darkred}x_i=1\] we need the vector \(\color{darkred}x_i\) to be of a certain length. Obviously, we need at least \(K\) elements to form the \(K\) largest ones. But for those \(K\) largest not to dominate, we need at least \(2K+1\) elements. Only then the \(K\) largest can be less than half of the total.

In my examples, I will use this constraint in a small portfolio problem: \[\begin{align} \min\>&\color{darkred}x'\color{darkblue}Q\color{darkred}x\\ & \sum_i \color{darkred}x_i=1\\&\sum_i \color{darkblue}\mu_i \color{darkred}x_i\ge \color{darkblue}R\\ & \color{darkred}x_i\ge 0\end{align}\] As we know \(\sum_i \color{darkred}x_i=1\), the right-hand side of the sum of \(K\) largest constraint will be constant: \[\text{sum of $K$ largest $\color{darkred}x_i$}  \le 0.5\] however, all the methods also work when the right-hand side contains a variable.

As a baseline, we have a random data set with \(N=12\) assets. The min-risk solution is:

----     57 ***baseline***
            VARIABLE z.L                   =        0.672  objective

----     57 VARIABLE x.L  fractions

i1  0.301,    i2  0.138,    i6  0.006,    i7  0.081,    i8  0.222,    i9  0.077,    i10 0.066,    i12 0.109

The sum of the \(K=3\) largest allocations is 0.301+0.222+0.138=0.661. We are violating our special constraint.

All possible combinations

The first formulation is based on the observation: bound the \(K\) largest values by 0.5 is the same as bounding all possible \(K\) combinations of indices \(i\) by 0.5. For example, if we denote the length of \(\color{darkred}x\) by \(N=12\), and use \(K=3\), then we can form the constraints: \[\color{darkred}x_i+\color{darkred}x_j+\color{darkred}x_k \le 0.5\] for all 3-combinations \((i,j,k)\). There are \[{N \choose K} = {12 \choose 3} = 220\] of them:

----     68 SET combinations3  all possible 3-combinations

i1 .i2 .i3 ,    i1 .i2 .i4 ,    i1 .i2 .i5 ,    i1 .i2 .i6 ,    i1 .i2 .i7 ,    i1 .i2 .i8 
i1 .i2 .i9 ,    i1 .i2 .i10,    i1 .i2 .i11,    i1 .i2 .i12,    i1 .i3 .i4 ,    i1 .i3 .i5 
i1 .i3 .i6 ,    i1 .i3 .i7 ,    i1 .i3 .i8 ,    i1 .i3 .i9 ,    i1 .i3 .i10,    i1 .i3 .i11
i1 .i3 .i12,    i1 .i4 .i5 ,    i1 .i4 .i6 ,    i1 .i4 .i7 ,    i1 .i4 .i8 ,    i1 .i4 .i9 
i1 .i4 .i10,    i1 .i4 .i11,    i1 .i4 .i12,    i1 .i5 .i6 ,    i1 .i5 .i7 ,    i1 .i5 .i8 
i1 .i5 .i9 ,    i1 .i5 .i10,    i1 .i5 .i11,    i1 .i5 .i12,    i1 .i6 .i7 ,    i1 .i6 .i8 
i1 .i6 .i9 ,    i1 .i6 .i10,    i1 .i6 .i11,    i1 .i6 .i12,    i1 .i7 .i8 ,    i1 .i7 .i9 
i1 .i7 .i10,    i1 .i7 .i11,    i1 .i7 .i12,    i1 .i8 .i9 ,    i1 .i8 .i10,    i1 .i8 .i11
i1 .i8 .i12,    i1 .i9 .i10,    i1 .i9 .i11,    i1 .i9 .i12,    i1 .i10.i11,    i1 .i10.i12
i1 .i11.i12,    i2 .i3 .i4 ,    i2 .i3 .i5 ,    i2 .i3 .i6 ,    i2 .i3 .i7 ,    i2 .i3 .i8 
i2 .i3 .i9 ,    i2 .i3 .i10,    i2 .i3 .i11,    i2 .i3 .i12,    i2 .i4 .i5 ,    i2 .i4 .i6 
i2 .i4 .i7 ,    i2 .i4 .i8 ,    i2 .i4 .i9 ,    i2 .i4 .i10,    i2 .i4 .i11,    i2 .i4 .i12
i2 .i5 .i6 ,    i2 .i5 .i7 ,    i2 .i5 .i8 ,    i2 .i5 .i9 ,    i2 .i5 .i10,    i2 .i5 .i11
i2 .i5 .i12,    i2 .i6 .i7 ,    i2 .i6 .i8 ,    i2 .i6 .i9 ,    i2 .i6 .i10,    i2 .i6 .i11
i2 .i6 .i12,    i2 .i7 .i8 ,    i2 .i7 .i9 ,    i2 .i7 .i10,    i2 .i7 .i11,    i2 .i7 .i12
i2 .i8 .i9 ,    i2 .i8 .i10,    i2 .i8 .i11,    i2 .i8 .i12,    i2 .i9 .i10,    i2 .i9 .i11
i2 .i9 .i12,    i2 .i10.i11,    i2 .i10.i12,    i2 .i11.i12,    i3 .i4 .i5 ,    i3 .i4 .i6 
i3 .i4 .i7 ,    i3 .i4 .i8 ,    i3 .i4 .i9 ,    i3 .i4 .i10,    i3 .i4 .i11,    i3 .i4 .i12
i3 .i5 .i6 ,    i3 .i5 .i7 ,    i3 .i5 .i8 ,    i3 .i5 .i9 ,    i3 .i5 .i10,    i3 .i5 .i11
i3 .i5 .i12,    i3 .i6 .i7 ,    i3 .i6 .i8 ,    i3 .i6 .i9 ,    i3 .i6 .i10,    i3 .i6 .i11
i3 .i6 .i12,    i3 .i7 .i8 ,    i3 .i7 .i9 ,    i3 .i7 .i10,    i3 .i7 .i11,    i3 .i7 .i12
i3 .i8 .i9 ,    i3 .i8 .i10,    i3 .i8 .i11,    i3 .i8 .i12,    i3 .i9 .i10,    i3 .i9 .i11
i3 .i9 .i12,    i3 .i10.i11,    i3 .i10.i12,    i3 .i11.i12,    i4 .i5 .i6 ,    i4 .i5 .i7 
i4 .i5 .i8 ,    i4 .i5 .i9 ,    i4 .i5 .i10,    i4 .i5 .i11,    i4 .i5 .i12,    i4 .i6 .i7 
i4 .i6 .i8 ,    i4 .i6 .i9 ,    i4 .i6 .i10,    i4 .i6 .i11,    i4 .i6 .i12,    i4 .i7 .i8 
i4 .i7 .i9 ,    i4 .i7 .i10,    i4 .i7 .i11,    i4 .i7 .i12,    i4 .i8 .i9 ,    i4 .i8 .i10
i4 .i8 .i11,    i4 .i8 .i12,    i4 .i9 .i10,    i4 .i9 .i11,    i4 .i9 .i12,    i4 .i10.i11
i4 .i10.i12,    i4 .i11.i12,    i5 .i6 .i7 ,    i5 .i6 .i8 ,    i5 .i6 .i9 ,    i5 .i6 .i10
i5 .i6 .i11,    i5 .i6 .i12,    i5 .i7 .i8 ,    i5 .i7 .i9 ,    i5 .i7 .i10,    i5 .i7 .i11
i5 .i7 .i12,    i5 .i8 .i9 ,    i5 .i8 .i10,    i5 .i8 .i11,    i5 .i8 .i12,    i5 .i9 .i10
i5 .i9 .i11,    i5 .i9 .i12,    i5 .i10.i11,    i5 .i10.i12,    i5 .i11.i12,    i6 .i7 .i8 
i6 .i7 .i9 ,    i6 .i7 .i10,    i6 .i7 .i11,    i6 .i7 .i12,    i6 .i8 .i9 ,    i6 .i8 .i10
i6 .i8 .i11,    i6 .i8 .i12,    i6 .i9 .i10,    i6 .i9 .i11,    i6 .i9 .i12,    i6 .i10.i11
i6 .i10.i12,    i6 .i11.i12,    i7 .i8 .i9 ,    i7 .i8 .i10,    i7 .i8 .i11,    i7 .i8 .i12
i7 .i9 .i10,    i7 .i9 .i11,    i7 .i9 .i12,    i7 .i10.i11,    i7 .i10.i12,    i7 .i11.i12
i8 .i9 .i10,    i8 .i9 .i11,    i8 .i9 .i12,    i8 .i10.i11,    i8 .i10.i12,    i8 .i11.i12
i9 .i10.i11,    i9 .i10.i12,    i9 .i11.i12,    i10.i11.i12

A disadvantage of writing something like: \(\color{darkred}x_i+\color{darkred}x_j+\color{darkred}x_k \le 0.5\) is that we hard-coded \(K=3\). If we want to make it more adaptive, we need to form a 2d set \(\color{darkblue}{\mathit{comb}}_{n,i}\), with \(n\) the combination id, and \(i\) the indices to include in this combination. This will look like:

----    113 SET comb  combinations

              i1          i2          i3          i4          i5          i6          i7          i8          i9

n1           YES         YES         YES
n2           YES         YES                     YES
n3           YES         YES                                 YES
n4           YES         YES                                             YES
n5           YES         YES                                                         YES
n6           YES         YES                                                                     YES
n7           YES         YES                                                                                 YES
n8           YES         YES
n9           YES         YES
n10          YES         YES
n11          YES                     YES         YES
n12          YES                     YES                     YES
n13          YES                     YES                                 YES
n14          YES                     YES                                             YES
n15          YES                     YES                                                         YES
n16          YES                     YES                                                                     YES
n17          YES                     YES
n18          YES                     YES
n19          YES                     YES
n20          YES                                 YES         YES
. . .

Note that this is a partial view. We can easily generate these combinations with a bit of Python code. itertools.combinations is our friend here. With this set we can write: \[\sum_{i|\color{darkblue}{\mathit{comb}}_{n,i}} \color{darkred}x_{i} \le 0.5 \>\>\forall n\] The structure does not change if we use a different value for \(K\).

When we run one of these models, we see:

----     75 ***all combinations***
            VARIABLE z.L                   =        0.760  objective

----     75 VARIABLE x.L  fractions

i1  0.160,    i2  0.139,    i3  0.001,    i7  0.101,    i8  0.201,    i9  0.121,    i10 0.139,    i12 0.139

The objective (risk) deteriorated from 0.672 to 0.760. The sum of the \(K=3\) largest allocations is binding at 0.201+0.160+0.139=0.5.

Obviously, the number of constraints can become rather large. E.g. for \(N=100, K=3\), we have\[{N \choose K} = {100 \choose 3} = 161,700\]

An integer formulation

A bound for the largest \(\color{darkred}x_i\), can be easily found using: \[\color{darkred}y_1 \ge \color{darkred}x_i\>\forall i\] For the second largest, we allow one exception in the inequalities: \[\begin{align}& \color{darkred}y_2 \ge \color{darkred}x_i - M\cdot\color{darkred}\delta_{2,i} & \forall i \\ & \sum_i  \color{darkred}\delta_{2,i} = 1 \\ & \color{darkred}\delta_{2,i} \in \{0,1\} \end{align}\] The exception can take place at the largest, so we have a bound on the second largest. This may require a bit of thinking before this is clear. The third largest is similar:\[\begin{align}& \color{darkred}y_3 \ge \color{darkred}x_i - M\cdot\color{darkred}\delta_{3,i} & \forall i \\ & \sum_i  \color{darkred}\delta_{3,i} =2 \\ & \color{darkred}\delta_{3,i} \in \{0,1\} \end{align}\] Here, we allow two exceptions on \(\color{darkred}y_3\ge  \color{darkred}x_i\). As our values \(\color{darkred}x_i\) are between 0 and 1, we can simply use \(\color{darkblue}M=1\). 

When we run this model, we see:

----    154 ***MIQP model***
            VARIABLE z.L                   =        0.760  objective

----    154 VARIABLE x.L  fractions

i1  0.160,    i2  0.139,    i3  0.001,    i7  0.101,    i8  0.201,    i9  0.121,    i10 0.139,    i12 0.139

Compact formulation

A compact linear formulation is from [1]: \[\begin{align}&\color{darkblue}K\cdot\color{darkred}t +\sum_i \color{darkred}u_i \le 0.5 \\ & \color{darkred}t +\color{darkred}u_i \ge \color{darkred}x_i \\ & \color{darkred}u_i \ge 0 \\ & \color{darkred}t\> \text{free}\end{align}\] This also delivers the same solution.

----    177 ***compact formulation***
            VARIABLE z.L                   =        0.760  objective

----    177 VARIABLE x.L  fractions

i1  0.160,    i2  0.139,    i3  0.001,    i7  0.101,    i8  0.201,    i9  0.121,    i10 0.139,    i12 0.139

This is not very intuitive (the formulation is based on a dual model). But it is the most efficient (linear, and just a few extra variables and constraints). Some modeling tools have this built-in as a function (yalmip has sumk, and cvxpy has sum_largest).


  • Here we used "sum of k largest" in a \(\le\) constraint. It is also applicable when it appears in an objective, as long as we are minimizing (when maximizing, things become non-convex).
  • If you want to minimize (or constrain) just the k-th largest, use the MIP formulation. I have seen such a construct used in some financial models because of regulatory frameworks.


  1. Stephen Boyd, Lieven Vandenberghe, Convex Optimization, 2004,  Exercise 5.19.

Appendix: GAMS code


Implement the constraint

sum of K largest x(i) <= 50% of sum(x)

We do this as part of a simple random portfolio problem.


* Size of the problem

Set i 'number of x(i)' /i1*i12/;

$set numk 3
* macro is used to dimension sets further down
scalar numk 'number of largest values to consider' /%numk%/;

* Random data
* using example from https://www.cvxpy.org/examples/index.html

'expected returns'
'variance/covariance matrix'

mu(i) = normal(0,1);
sigma(i,j) = normal(0,1);
sigma(i,j) =
sum(k, sigma(k,i)*sigma(k,j));

display mu,sigma;

* standard portfolio problem

positive variable x(i) 'fractions';
variable z 'objective';

'quadratic objective'
'minimum expected revenue'
'sum to 1'

obj..    z =e=
sum((i,j), x(i)*x(j)*sigma(i,j));
sum(i,x(i)) =e= 1;
sum(i, mu(i)*x(i)) =g= 0.1;

model m1 /all/;
solve m1 minimizing z using qcp;
display "***baseline***",z.l,x.l;

* formulation 1: all 3-combinations
* this generates 120 constraints

set combinations3(i,j,k) 'all possible 3-combinations';
combinations3(i,j,k) =
ord(j)>ord(i) and ord(k)>ord(j);
option combinations3:0:0:6;
display combinations3;

equation ksum(i,j,k) 'sum over possible combinations of three terms';

ksum(combinations3(i,j,k))..  x(i)+x(j)+x(k) =l= 0.5;

model m2 /all/;
solve m2 minimizing z using qcp;
display "***all combinations***",z.l,x.l;

* formulation 1a: all 3-combinations
* same thing but a bit more general

'key for combinations'

$onEmbeddedCode Python:

import itertools

# list of labels from set i
seti = list(gams.get(

# integer K
k = int(list(gams.get(

# list of combinations
c = list(itertools.combinations(seti,k))

comb = []
for i in range(len(c)):
   nn = f
   thiscomb = c[i]
for j in range(k):


$offEmbeddedCode comb

display n,comb;

equation ksum2(n);

sum(comb(n,i), x(i)) =l= 0.5;

model m2a /m1+ksum2/;
solve m2a minimizing z using qcp;
display "***all combinations v2***",z.l,x.l;

* formulation 2
* binary variables are used to say "y[k] >= x[i] with k-1 exceptions"

set kbest 'best k-th' /k1*k%numk%/ ;

variable y(kbest) 'bound on value of k-th best';
binary variable delta(kbest,i) 'exceptions';

scalar'big-M' /1/;

'bound on y(kbest)'
'number of exceptions in yk >= x(i)'
'limit sum k largest'

bound(kbest,i).. y(kbest) =g= x(i) - M*delta(kbest,i);
sum(i, delta(kbest,i)) =e= ord(kbest)-1;
sum(kbest,y(kbest)) =l= 0.5;

model m3 /m1,bound,exceptions,ysum/;
option miqcp=cplex;
solve m3 minimizing z using miqcp;

display "***MIQP model***",z.l,x.l,y.l,delta.l;

* formulation 3
* compact formulation

variable t;
positive variable u(i);


compact1.. numK*t +
sum(i,u(i)) =l= 0.5;
compact2(i).. t+u(i) =g= x(i);

model m4 /m1,compact1,compact2/;
solve m4 minimizing z using qcp;

display "***compact formulation***",z.l,x.l,t.l,u.l;


1 comment:

  1. A somewhat intuitive interpretation of the compact formulation is that t represents the Kth largest value, which gets counted K times, and u_i = max(x_i - t, 0) represents the amount by which x_i exceeds t, to account for the rest of the sum of the K largest values.