Saturday, August 25, 2018

Best Factorization X=AB

Restated from an originally posting in [1]:

Given a matrix \(X\) find a factorization \(X\approx AB\) such that all elements of \(A,B\) are between 0 and 1 and the matrix product \(AB\) is as close as possible to \(X\). Can we solve this as an NLP (nonlinear programming) model?

My answer: yes, but....

The model can simply look like: \[\begin{align} \min\> & ||AB-X||^2 \\ & 0 \le A,B \le 1 \end{align}\] or \[\begin{align} \min &\sum_{i,j} \left ( \sum_k   a_{i,k} b_{k,j} - x_{i,j}   \right )^2\\ & 0 \le a_{i,k},b_{k,j} \le 1\end{align} \] However, this is a rather nasty non-convex problem. As I expected to find local optima, I solved this in a loop with a different random starting points for \(A\) and \(B\) (each uniform between 0 and 1). With a small data set \[\begin{align}&i=1,\dots,8\\ &j=1,\dots,12\\ & k=1,\dots,6\end{align}\] and random values for \(X\) (uniform between 0 and 6), we see:



Results with NLP solver CONOPT


----     59 PARAMETER results  

               obj   modelstat        time        best

iter1      129.370    localopt       0.062     129.370
iter2      129.088    localopt       0.094     129.088
iter3      126.113    localopt       0.047     126.113
iter4      127.768    localopt       0.031
iter5      128.103    localopt       0.046
iter6      128.178    localopt       0.094
iter7      126.644    localopt       0.031
iter8      129.179    localopt       0.032
iter9      128.201    localopt       0.047
iter10     130.898    localopt       0.125
iter11     125.803    localopt       0.063     125.803
iter12     127.407    localopt       0.063
iter13     127.898    localopt       0.031
iter14     127.768    localopt       0.047
iter15     129.823    localopt       0.031
iter16     125.803    localopt       0.047
iter17     129.823    localopt       0.063
iter18     128.901    localopt       0.047
iter19     126.105    localopt       0.047
iter20     128.164    localopt       0.032


Results with NLP solver IPOPT

----     59 PARAMETER results  

               obj   modelstat        time        best

iter1      126.113    localopt       0.344     126.113
iter2      128.129    localopt       0.266
iter3      126.113    localopt       0.313
iter4      128.103    localopt       0.218
iter5      128.901    localopt       0.266
iter6      125.803    localopt       0.344     125.803
iter7      129.330    localopt       0.375
iter8      128.816    localopt       0.360
iter9      128.103    localopt       0.406
iter10     130.276    localopt       0.390
iter11     127.255    localopt       0.328
iter12     126.113    localopt       0.281
iter13     129.002    localopt       0.297
iter14     128.149    localopt       0.235
iter15     129.823    localopt       0.266
iter16     126.545    localopt       0.328
iter17     127.094    localopt       0.204
iter18     128.164    localopt       0.344
iter19     128.201    localopt       0.313
iter20     128.723    localopt       0.281


Both solvers find a solution with an objective of 125.803. This is a small problem with \(8 \times 6 + 6 \times 12 = 120\) nonlinear variables. This looks doable for a global solver like Baron or Antigone. The problem is that we have \(8 \times 6\times 12 = 576\) bilinear terms \(a_{i,k}b_{k,j}\). This will make life really difficult for the global solvers. Let's try this, while using the 125.803 solution as an initial point and with a time limit of 1000 seconds.


Results with BARON


  Iteration    Open nodes         Time (s)    Lower bound      Upper bound
          1             1             3.00     54.4268          125.803    
*         4             3             8.00     54.4268          125.715    
         41            28            38.00     57.2903          125.715    
         76            54            69.00     58.6781          125.715    
        115            82           100.00     59.7597          125.715    
        155           111           131.00     60.2728          125.715    
        190           137           162.00     60.5990          125.715    
        229           164           192.00     60.8507          125.715    
        266           192           223.00     61.1276          125.715    
        301           216           254.00     61.3283          125.715    
        335           239           284.00     61.4689          125.715    
        373           265           315.00     61.7422          125.715    
        414           296           346.00     61.9213          125.715    
        454           321           377.00     62.1128          125.715    
        492           348           408.00     62.3822          125.715    
        528           370           438.00     62.4439          125.715    
        567           392           469.00     62.5986          125.715    
        598           412           499.00     62.6449          125.715    
        631           433           531.00     62.6994          125.715    
        664           458           561.00     62.7280          125.715    
        695           478           593.00     62.7932          125.715    
        728           501           624.00     62.8952          125.715    
        754           520           655.00     62.9707          125.715    
        782           539           685.00     63.0497          125.715    
        809           558           716.00     63.1053          125.715    
        834           577           746.00     63.1215          125.715    
        866           599           777.00     63.1805          125.715    
        894           620           808.00     63.1963          125.715    
        920           636           839.00     63.3036          125.715    
        949           655           869.00     63.3194          125.715    
        978           677           900.00     63.4053          125.715    
       1007           695           930.00     63.4625          125.715    
       1036           717           961.00     63.5189          125.715    
       1066           737           992.00     63.5608          125.715    
       1096           758          1023.00     63.6598          125.715    
       1122           778          1054.00     63.7217          125.715    
       1149           797          1085.00     63.8112          125.715    
       1160           804          1097.00     63.8213          125.715    

                    *** Max. allowable time exceeded ***      


Results with Antigone

-------------------------------------------------------------------------------
Time (s) Nodes explored Nodes remaining Best possible   Best found Relative Gap
-------------------------------------------------------------------------------
 
     Searching for feasible solutions with 4 starting points at tree level 0 --
      25              1               1    +9.689e+01   +1.258e+02   +2.298e-01
     Adding 0 total cutting planes at tree level 0 ----------------------------
     Adding 739 total cutting planes at tree level 0 --------------------------
     Strong Branching at tree level 1 -----------------------------------------
     Strong Branching at tree level 1 -----------------------------------------
     561              1               2    +9.893e+01   +1.257e+02   +2.131e-01
     Adding 0 total cutting planes at tree level 1 ----------------------------
     Generating Edge-Concave Cuts; 0 generated so far -------------------------
     Strong Branching at tree level 2 -----------------------------------------
     Strong Branching at tree level 2 -----------------------------------------
     684              2               3    +9.893e+01   +1.257e+02   +2.131e-01
     Adding 0 total cutting planes at tree level 1 ----------------------------
     Strong Branching at tree level 2 -----------------------------------------
     Strong Branching at tree level 2 -----------------------------------------
     Strong Branching at tree level 2 -----------------------------------------
     804              3               4    +9.894e+01   +1.257e+02   +2.130e-01
     Adding 0 total cutting planes at tree level 2 ----------------------------
     Strong Branching at tree level 3 -----------------------------------------
     Strong Branching at tree level 3 -----------------------------------------
     905              4               5    +9.894e+01   +1.257e+02   +2.130e-01
     921              5               6    +9.897e+01   +1.257e+02   +2.128e-01
     Adding 0 total cutting planes at tree level 3 ----------------------------
     962              6               7    +9.897e+01   +1.257e+02   +2.128e-01
     970              7               8    +9.897e+01   +1.257e+02   +2.128e-01
     979              8               9    +9.897e+01   +1.257e+02   +2.128e-01
     986              9              10    +9.897e+01   +1.257e+02   +2.128e-01
     993             10              11    +9.897e+01   +1.257e+02   +2.128e-01

                                               Reached time limit of 1000 CPU s
    1000             10              11    +9.897e+01   +1.257e+02   +2.128e-01

Both solvers have troubles closing the gap. Antigone seems to do better initially, but rest assured: it will slow down considerably and bounds are just not getting closer reasonably fast. But also: both solvers are able to improve the solution from 125.803 to 125.715. Interesting results.

We can make things less non-linear by considering absolute deviations:\[\begin{align}\min & \sum_{i,j} \left| \sum_k a_{i,k} b_{k,j} - x_{i,j}\right |\\ & 0 \le a_{i,k},b_{k,j} \le 1  \end{align}\] or \[\begin{align} \min\> & \sum_{i,j} y_{i,j} \\ & -y_{i,j} \le \sum_k a_{i,k} b_{k,j} - x_{i,j} \le y_{i,j} \\ & 0 \le a_{i,k},b_{k,j} \le 1 \\ & y_{i,j} \ge 0 \end{align}\] Note: when implementing this we should make sure not to duplicate the expression \(\sum_k a_{i,k} b_{k,j} - x_{i,j}\). That means: introduce another set of variables.  An alternative least absolute deviations formulation uses variable splitting and can look like: \[\begin{align} \min\> & \sum_{i,j} \left ( y^+_{i,j}+y^-_{i,j}\right)  \\ & y^+_{i,j}- y^-_{i,j} = \sum_k a_{i,k} b_{k,j} - x_{i,j}\\ & 0 \le a_{i,k},b_{k,j} \le 1 \\ & y^+_{i,j},y^-_{i,j}\ge 0\end{align} \] These least absolute deviations models do not buy us much w.r.t. performance: we are stuck with the bilinear forms \(a_{i,k} b_{k,j}\).

References


No comments:

Post a Comment