## Friday, March 2, 2018

### RAS, Entropy and Exponential Cones

#### RAS Method

RAS is a well-known algorithm to perform matrix balancing:

\begin{align} \min \>& \mathit{dist}(A,A^0)\\ & \sum_j a_{i,j} = u_i &&\forall i\\&\sum_i a_{i,j} = v_j &&\forall j \\ & a_{i,j} \gt 0 \end{align}
i.e. find a matrix $$A$$ that is as close as possible to a given matrix $$A^0$$ while obeying row and column sum constraints. This method is often used to "clean-up" data sets that contain data from different sources. The goal is to achieve a consistent data set as is often required by subsequent modelling efforts.

The RAS method looks like:

RAS Algorithm
Step 1Initialization$A := A^0$
Step 2Row Scaling\begin{align}&\rho_i := \frac{u_i}{\displaystyle\sum_j a_{i,j}}\\ &a_{i,j} := \rho_i a_{i,j}\end{align}
Step 3Column Scaling\begin{align}&\sigma_j := \frac{v_j}{\displaystyle\sum_i a_{i,j}}\\ &a_{i,j} := \sigma_j a_{i,j}\end{align}
Step 4If not converged go to step 2

This algorithm works very well and is widely used.

There are quite a few extensions developed for this algorithm. An example is  GRAS (Generalized RAS) to handle positive and negative values [5]. When dealing with positive and negative numbers, concepts like sign preservation pop up. Dealing with unknown totals is another extension [7].

For many applications it is important to be able to add additional general constraints. This is not so easy for RAS. The entropy method discussed below will allow you to do that in a straightforward manner: just add the constraints.

#### Entropy Method

The following optimization model gives identical results as the RAS method:

\begin{align} \min \>& \sum_{i,j} a_{i,j} \log \left( \frac{a_{i,j}}{a^0_{i,j}} \right) \\ & \sum_j a_{i,j} = u_i &&\forall i\\&\sum_i a_{i,j} = v_j &&\forall j \\ & a_{i,j} \gt 0 \end{align}

Notes:
• The objective can be rewritten as: $$\sum_{i,j} [a_{i,j} \log(a_{i,j}) - a_{i,j} \log(a^0_{i,j}) ]$$. The second term is linear.
• Usually we set a lower bound: $$a_{i,j} \ge \varepsilon$$ for some $$\varepsilon>0$$.
• If we really want to allow $$a_{i,j}=0$$ we can replace $$\log(a_{i,j})$$ by $$\log(a_{i,j}+\varepsilon )$$.
• We can add extra constraints to this model. An example is when we deal with world trade flows: the sum of all net exports (i.e. export $$-$$ import) should be zero.
• This is a convex problem but it leads to many superbasic variables. This means NLP solvers like MINOS, SNOPT and CONOPT are struggling a bit: they like models with relative few superbasic variables. Interior point solvers like IPOPT and KNITRO solve this problem more quickly.
• Mosek versions from before 2018 can also solve this as a general convex NLP. From 2018 we need to reformulate the problem (see below) [1].

In addition we may have some zeroes in the matrix $$A^0$$ which we want to be kept zero. In other words, we want to preserve the sparsity pattern. The $$A^0$$ matrix elements are called priors in the world of Entropy estimation. In GAMS such a model can look like:

variables A(i,j),z;

equations
objective
rowsum(i)
colsum(j)
;

objective.. z =e= sum((i,j)$A0(i,j), A(i,j)*log(A(i,j)/A0(i,j))); rowsum(i).. sum(j$A0(i,j), A(i,j)) =e= A0(i,'rowTotal');
colsum(j).. sum(i$A0(i,j), A(i,j)) =e= A0('colTotal',j); A.L(i,j) = A0(i,j); A.lo(i,j)$A0(i,j) = 0.0001;

model m1 /all/;
solve m1 minimizing z using nlp;


Note that we skip here cases where $$a^0_{i,j}=0$$, so zeroes remain zero.

#### Other cost functions

Sometimes a quadratic cost function is used. To measure "relative" distances we can do:
$\min \sum_{i,j} \left(\frac{a_{i,j}}{a^0_{i,j}}-1\right)^2$ Again we would exclude cases with $$a^0_{i,j}=0$$ from this.

Yet another possibility is to minimize absolute values [6]. The resulting model becomes a linear programming model (LP). A quadratic cost function puts a higher penalty on large deviations.

#### Exponential Cone

Conic programming has become an important way to model and solve certain convex problems. Some solvers support an Exponential Cone. This is a constraint of the form:

$x \ge y \exp\left(\frac{z}{y}\right)$

with $$y > 0$$. With a little bit of effort we can show that minimizing the entropy function $$\min (x \log x)$$ can be implemented using this artifact. The first thing to do is to write  $$\min z$$ subject to $$z \ge x \log x$$. Now we can do:

\begin{align} & z \ge x \log x \Rightarrow \\ & -z \le x \log\left( \frac{1}{x} \right) \Rightarrow \\ & - \frac{z}{x} \le \log\left( \frac{1}{x} \right) \Rightarrow \\ & \exp\left( - \frac{z}{x} \right) \le \frac{1}{x} \Rightarrow \\ & x \exp\left( \frac{-z}{x} \right) \le 1 \end{align}

and we have our Exponential Cone. To verify this, we can solve the convex NLP with the cone simulated with a nonlinear constraint:

\begin{align} \min \>& \sum_{i,j} z_{i,j} - \sum_{i,j} a_{i,j} \log (a^0_{i,j}) \\ & \sum_j a_{i,j} = u_i &&\forall i\\&\sum_i a_{i,j} = v_j &&\forall j \\ & a_{i,j} \exp \left( \frac{-z_{i,j}}{a_{i,j}} \right) \le 1 && \forall i,j\\ & a_{i,j} \gt 0\end{align}

This is only to see if our derivation is correct. This formulation has not much practical use. With this model we added extra variables $$z_{i,j}$$ and extra equations. We also moved the non-linearities from the objective into the constraints. For a general purpose NLP solver this is bad news. We can see that with a small test model:

Model Solver Iterations Objective Evaluation Errors
EntropyConopt15-15.76870
Simulated Exponential ConeConopt47-15.76874
EntropyIPOPT5-15.76870
Simulated Exponential ConeIPOPT15-15.76870
EntropyMosek 88-15.76870
Simulated Exponential ConeMosek 813-15.76870

Note that iteration counts are not comparable between different solvers. Obviously and as expected this conic reformulation makes no sense when using a general NLP solver. Iteration counts go up and we seem to have some overflows in the evaluation of the nonlinear constraint function (or gradient). The expression $$\exp(z/y)$$ is inherently dangerous. Note that conic solvers handle this very differently inside.

#### Conic Modeling

Mosek will drop support for general convex NLPs [1]. It is suggested to use a conic formulation. So lets try out some other solvers that support exponential cones.

To model this for use with a convex solver, one would typically use a built-in function to express the entropy function. In CVXPY [2] we can use entr which is defined as $$-x\log(x)$$. Here is a try to solve this small problem with CVXPY:

from cvxpy import *
from numpy import sign, reshape

A0 =  [[ 230 , 375 , 375 , 100 ,   0 , 685 , 215 ,   0 ,  50 ,   0 ],
[ 330 , 405 , 419 , 175 ,  90 , 504 , 515 ,   0 , 240 , 105 ],
[ 268 , 225 , 242 ,   0 ,  30 , 790 , 301 ,  44 , 100 ,   0 ],
[ 595 , 380 , 638 , 275 ,  30 , 685 , 605 ,  88 , 100 , 160 ],
[ 340 , 360 , 440 , 200 ,  30 , 755 , 475 ,  44 , 150 ,   0 ],
[ 132 , 190 , 200 ,   0 ,   0 , 432 , 130 ,   0 ,   0 ,   0 ],
[ 309 , 330 , 350 , 125 ,   0 , 612 , 474 ,   0 ,  50 ,  50 ],
[ 365 , 400 , 330 , 150 ,  50 , 575 , 600 ,  44 , 150 , 110 ],
[ 210 , 250 , 308 , 125 ,   0 , 720 , 256 ,   0 , 100 ,  50 ]]

u = [2029,2798,1998,3566,2794,1071,2305,2747,2015]
v = [2772,2910,3300,1150,240,5760,3526,220,950,495]

m = len(u)
n = len(v)

# we need to exclude cases with A0[i,j]=0 in obj and constraints
indic = [[sign(A0[i][j]) for j in range(n)] for i in range(m)]

a = Variable(n,m)

lb = a >= mul_elemwise(0.0001,indic)
ub = a <= mul_elemwise(10000.0,indic)
colsum = sum_entries(a, axis=0) == reshape(u,(1,m))
rowsum = sum_entries(a, axis=1) == v
cons = [lb,ub,colsum,rowsum]
obj = Minimize(sum_entries(mul_elemwise(indic,-entr(a)-mul_elemwise(log(A0),a))))
prob = Problem(obj, cons)
#prob.solve(solver=SCS, verbose=True)
prob.solve(solver=ECOS, verbose=True)


This may not be the best way to implement things: I was struggling a bit with handling the zeroes in the matrix $$A^0$$. Instead of fixing $$a_{i,j}=0$$, I probably should not even generate those variables. In the GAMS model above this was easy. If the solver has good presolve capabilities this does not matter as much (although some data sets have a large fraction of zeros, so not even generating them is better). Any idea how to model this properly in CVXPY let me know. (May be, as mentioned in the comments, the function kl_div does the job safely and we can keep all variables a in the model. Q: will the presolver take them out?).

The results are mixed. The output of ECOS is:

ECOS 2.0.4 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
0  +0.000e+00  -7.401e+05  +5e+02  1e+00  7e-01  1e+00  1e+00    ---    ---    0  0  - |  -  -
1  +1.202e+03  -7.382e+05  +5e+01  1e+00  7e-01  8e+00  1e-01  0.8875  1e-02   1  1  1 |  0  0
2  +5.098e+03  -7.301e+05  +1e+01  1e+00  8e-01  4e+01  3e-02  0.8190  4e-02   1  1  1 |  0  0
3  +8.759e+03  -7.219e+05  +6e+00  1e+00  8e-01  7e+01  1e-02  0.5013  1e-01   1  1  1 |  1  3
4  +2.579e+04  -6.784e+05  +2e+00  1e+00  8e-01  2e+02  4e-03  0.7151  3e-02   1  1  1 |  0  1
5  +4.993e+04  -6.066e+05  +8e-01  9e-01  7e-01  4e+02  2e-03  0.5789  4e-02   1  1  1 |  1  2
6  +1.094e+05  -3.718e+05  +2e-01  7e-01  5e-01  9e+02  5e-04  0.7295  2e-03   1  1  1 |  0  1
7  +9.292e+04  -2.105e+05  +1e-01  4e-01  4e-01  7e+02  2e-04  0.5738  5e-02   1  1  1 |  1  2
8  +4.286e+04  -6.659e+04  +3e-02  1e-01  1e-01  3e+02  8e-05  0.7020  1e-02   1  1  1 |  0  1
9  +1.456e+04  -1.859e+04  +1e-02  4e-02  4e-02  8e+01  2e-05  0.7187  2e-02   1  1  1 |  0  1
10  +4.757e+03  -5.169e+03  +3e-03  1e-02  1e-02  2e+01  7e-06  0.7136  1e-02   1  0  0 |  0  1
11  +1.895e+03  -2.133e+03  +1e-03  5e-03  5e-03  9e+00  3e-06  0.6266  5e-02   2  0  0 |  1  2
12  +3.885e+02  -5.152e+02  +3e-04  1e-03  1e-03  2e+00  6e-07  0.7736  3e-03   2  0  0 |  0  1
13  +1.566e+02  -2.156e+02  +1e-04  5e-04  4e-04  9e-01  3e-07  0.6266  5e-02   2  0  0 |  2  2
14  +2.161e+01  -6.152e+01  +3e-05  1e-04  1e-04  2e-01  6e-08  0.7833  1e-02   1  0  0 |  1  1
15  -3.765e-02  -3.416e+01  +1e-05  5e-05  4e-05  8e-02  2e-08  0.6266  5e-02   1  0  0 |  2  2
16  -1.240e+01  -2.001e+01  +2e-06  1e-05  9e-06  2e-02  5e-09  0.7833  1e-02   1  0  0 |  1  1
17  -1.437e+01  -1.747e+01  +9e-07  4e-06  4e-06  8e-03  2e-09  0.6266  5e-02   0  0  0 |  2  2
18  -1.546e+01  -1.616e+01  +2e-07  9e-07  8e-07  2e-03  5e-10  0.7833  9e-03   1  0  0 |  1  1
19  -1.564e+01  -1.593e+01  +8e-08  4e-07  3e-07  7e-04  2e-10  0.6266  5e-02   1  0  0 |  2  2
20  -1.574e+01  -1.580e+01  +2e-08  8e-08  7e-08  2e-04  4e-11  0.7833  9e-03   1  0  0 |  1  1
21  -1.576e+01  -1.578e+01  +8e-09  3e-08  3e-08  7e-05  2e-11  0.6266  5e-02   0  0  0 |  2  2
22  -1.577e+01  -1.577e+01  +2e-09  8e-09  7e-09  1e-05  4e-12  0.7833  9e-03   1  0  0 |  1  1

OPTIMAL (within feastol=7.7e-09, reltol=1.1e-10, abstol=1.7e-09).
Runtime: 0.012709 seconds.

Obj = -15.76630664400414


We get the correct objective although with more iterations than IPOPT needs to solve the model with entropy objective directly as an NLP. With the SCS solver, I see more problems:

----------------------------------------------------------------------------
SCS v1.2.6 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012-2016
----------------------------------------------------------------------------
Lin-sys: sparse-indirect, nnz in A = 540, CG tol ~ 1/iter^(2.00)
eps = 1.00e-03, alpha = 1.50, max_iters = 2500, normalize = 1, scale = 1.00
Variables n = 180, constraints m = 469
Cones: primal zero / dual free vars: 19
linear vars: 180
exp vars: 270, dual exp vars: 0
Setup time: 2.41e-03s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 1.#Je+00  1.#Je+00 -1.#Je+00 -1.#Je+00 -1.#Je+00  1.#Je+00  6.79e-03
100| 5.58e-03  3.02e-03  1.44e-01 -6.58e+04 -4.92e+04  2.61e-10  1.07e-01
200| 3.31e-03  2.28e-03  1.82e-01 -5.56e+04 -3.85e+04  3.70e-10  2.36e-01
300| 2.52e-03  1.97e-03  2.11e-01 -4.69e+04 -3.06e+04  2.61e-10  3.53e-01
400| 1.92e-03  1.92e-03  2.43e-01 -3.94e+04 -2.40e+04  3.61e-10  4.70e-01
500| 1.55e-03  1.80e-03  2.76e-01 -3.27e+04 -1.86e+04  2.44e-10  5.84e-01
600| 1.25e-03  1.64e-03  3.14e-01 -2.68e+04 -1.40e+04  3.22e-10  6.98e-01
700| 1.10e-03  1.44e-03  3.54e-01 -2.16e+04 -1.03e+04  0.00e+00  8.07e-01
800| 1.14e-03  1.23e-03  3.84e-01 -1.71e+04 -7.63e+03  2.55e-10  9.14e-01
900| 8.94e-04  1.07e-03  4.16e-01 -1.36e+04 -5.59e+03  3.05e-10  1.02e+00
1000| 6.46e-04  8.99e-04  4.34e-01 -1.07e+04 -4.22e+03  3.54e-10  1.12e+00
1100| 5.35e-04  7.22e-04  4.47e-01 -8.39e+03 -3.20e+03  3.99e-10  1.21e+00
1200| 5.01e-04  5.90e-04  4.39e-01 -6.62e+03 -2.58e+03  2.19e-10  1.32e+00
1300| 3.90e-04  4.84e-04  4.21e-01 -5.29e+03 -2.16e+03  2.36e-10  1.43e+00
1400| 2.60e-04  4.10e-04  3.83e-01 -4.31e+03 -1.92e+03  2.49e-10  1.54e+00
1500| 3.38e-04  3.31e-04  3.38e-01 -3.52e+03 -1.74e+03  2.60e-10  1.64e+00
1600| 1.96e-04  3.09e-04  2.90e-01 -3.00e+03 -1.65e+03  2.69e-10  1.74e+00
1700| 2.13e-04  2.43e-04  2.32e-01 -2.61e+03 -1.62e+03  2.75e-10  1.83e+00
1800| 2.05e-04  2.24e-04  1.81e-01 -2.28e+03 -1.59e+03  2.80e-10  1.92e+00
1900| 1.49e-04  2.19e-04  1.42e-01 -2.07e+03 -1.56e+03  2.83e-10  2.01e+00
2000| 1.44e-04  2.12e-04  1.13e-01 -1.92e+03 -1.53e+03  2.86e-10  2.12e+00
2100| 1.41e-04  2.09e-04  8.66e-02 -1.79e+03 -1.51e+03  2.88e-10  2.22e+00
2200| 1.32e-04  2.06e-04  6.30e-02 -1.71e+03 -1.50e+03  2.89e-10  2.31e+00
2300| 1.35e-04  2.04e-04  4.97e-02 -1.64e+03 -1.49e+03  2.90e-10  2.40e+00
2400| 1.30e-04  2.05e-04  3.88e-02 -1.59e+03 -1.47e+03  2.91e-10  2.49e+00
2500| 1.26e-04  2.03e-04  3.02e-02 -1.56e+03 -1.47e+03  2.92e-10  2.57e+00
----------------------------------------------------------------------------
Status: Solved/Inaccurate
Hit max_iters, solution may be inaccurate
Timing: Solve time: 2.57e+00s
Lin-sys: avg # CG iterations: 3.88, avg solve time: 5.10e-05s
Cones: avg projection time: 9.44e-04s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 1.0027e-03, dist(y, K*) = 0.0000e+00, s'y/|s||y| = -5.5744e-11
|Ax + s - b|_2 / (1 + |b|_2) = 1.2581e-04
|A'y + c|_2 / (1 + |c|_2) = 2.0276e-04
|c'x + b'y| / (1 + |c'x| + |b'y|) = 3.0169e-02
----------------------------------------------------------------------------
c'x = -1560.6383, -b'y = -1469.2022
============================================================================

Obj = -1560.638313679461


This does not look very good.

Update:

1. After increasing the iteration limit for SCS I observed an evaluation error. After changing the lower bounds to $$a_{i,j}\ge 0.0001$$ I get a solution with objective -1337.97.
2. SCS 2.0.2 does not converge for me.
3. A slightly different problem was solved correctly using SCS 1.2.6 using Matlab/cvx (see the comments by Mark Stone).
4. Dropping the bounds (i.e. removing the preservation of zeroes) helps a lot (see the comments by Erling Andersen).

We are working with very large data sets on similar (but more complicated) models (size indication: number of cells to estimate varies between 1e4 and 1e6). These models are currently using Mosek using their general convex NLP functionality. I am curious how this will work out with these new exponential cones. There are two issues here: (a) Mosek using their new Exponential Cones, and (b) how will the GAMS interface look like (hopefully not something special for Mosek only: that would mean changing the model depending on the solver). As a backup we always can resort to using NLP solvers like IPOPT and Knitro: they accept clean GAMS notation and in general, solve this problem quite easily.