Sunday, April 11, 2010

L1 portfolio formulation

In the paper “Mean-Absolute Deviation Portfolio Optimization Model and Its Applications to Tokyo Stock Market,” Hiroshi Konno and Hiroaki Yamazaki, Management Science, Vol. 37, No. 5 (May, 1991), pp. 519-531 ( the following Linear Programming model is presented:


Here x(j) is the allocation (the variable we are interested in), and y(t) are auxiliary variables. The data are:

  • a(j,t) are mean adjusted returns
  • M0 is the budget
  • rho is the required return
  • r(j) are the stock returns

This model is in fact almost identical to what is presented in this blog before: Actually the formulation in the paper by Konno e.a. is slightly non-optimal: the big data block sum(j, a(j,t)*x(j)) is present twice in the model. For large data sets this can be felt in larger solution times. Hence a slightly better formulation could be:

imageOn a synthetic data set with n=2000 stocks and T=300 observations we have:

Model Konno ea Improved
variables 2300 2600
equations  602 302
nonzero elements 1,204,900 605,200
simplex iterations 2031 1224
solution time 8.9 2.97

GAMS/Gurobi did not really like my improved formulation:

Starting Gurobi...
Optimize a model with 302 Rows, 2600 Columns and 604600 NonZeros

Presolve time: 0.34s
Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    0.0000000e+00   1.320000e+00   0.000000e+00      0s
Out of memory
*** Out of memory
*** Could not solve problem (10001)

This is probably due to this 100% dense block. I’ll pass it on for investigation. Gurobi is under active development so this may be already working ok with later versions. To humble me it solves the original formulation from Konno/Yamazaki just fine!

The above timings were for Cplex/dual. As the original authors used MINOS, here are some timings with MINOS:

  • Konno e.a.: its:17032, time:28.7
  • Improved: its:11884, time:7.8

    The timings with Cplex/dual and Minos/primal point in the same direction. Note that MINOS is faster with the improved model than Cplex on the Konno model. So proper modeling can sometimes help more than choosing a faster solver (of course both having a better formulation and a faster solver is even better). The importance of proper modeling is usually more important for MIP models, but here we see some real differences in performance for different formulations in an LP.


    1. I did a talk at Informs in Washington 2008 about L1 and Linf norms. Actually it is a tricky issue, where you have to think carefully about how you formulate the model in relation to the simplex algorithm. You can specialize the primal simplex algorithm to do something similar as boundswapping in the dual (which is one of the main reasons dual simplex is on average fastest simplex algo) i.e you explore duplicated columns to take longer stepsizes, the speed up is dramatic.
      You should be able to see the slides under my slideshare section at linked in :

      if not let me know and I will email it to you.

    2. Looks like this is about Mosek primal and dual simplex. I believe this particular problem is dominated by how you handle this dense data. Mosek's default interior point solver does very good on this problem, the simplex solvers not so much.

    3. This looks like an issue that we fixed in Gurobi 2.0.2 a few months back. What version are you using?

    4. This was with Gurobi 2.0.1 (the latest available from GAMS). I'll ask the GAMS people when they will make 2.0.2 available.

    5. I don't think Gurobi 2.0.2 was ever made available to GAMS users. They will have to wait for version 3.0. This a little bit unfortunate as there were a few somewhat important issues fixed related to the above model. See: 2.0.2. notes.

    6. Your second formulation is definitely better for interior-point methods as yourself suggest. A useful heuristic is to think about the number of nonzeros in the constraint matrix which is about half in the second formulation.

      Let me know if you put the GAMS/MPS file anywhere.

    7. Gurobi 3 solves both models fine (as expected: model 2 solves faster than model 1).