Given a matrix \(X\) find a factorization \(X\approx AB\) such that all elements of \(A,B\) are between 0 and 1 and the matrix product \(AB\) is as close as possible to \(X\). Can we solve this as an NLP (nonlinear programming) model?
My answer: yes, but....
The model can simply look like: \[\begin{align} \min\> & ||AB-X||^2 \\ & 0 \le A,B \le 1 \end{align}\] or \[\begin{align} \min &\sum_{i,j} \left ( \sum_k a_{i,k} b_{k,j} - x_{i,j} \right )^2\\ & 0 \le a_{i,k},b_{k,j} \le 1\end{align} \] However, this is a rather nasty non-convex problem. As I expected to find local optima, I solved this in a loop with a different random starting points for \(A\) and \(B\) (each uniform between 0 and 1). With a small data set \[\begin{align}&i=1,\dots,8\\ &j=1,\dots,12\\ & k=1,\dots,6\end{align}\] and random values for \(X\) (uniform between 0 and 6), we see:
Results with NLP solver CONOPT ---- 59 PARAMETER results obj modelstat time best iter1 129.370 localopt 0.062 129.370 iter2 129.088 localopt 0.094 129.088 iter3 126.113 localopt 0.047 126.113 iter4 127.768 localopt 0.031 iter5 128.103 localopt 0.046 iter6 128.178 localopt 0.094 iter7 126.644 localopt 0.031 iter8 129.179 localopt 0.032 iter9 128.201 localopt 0.047 iter10 130.898 localopt 0.125 iter11 125.803 localopt 0.063 125.803 iter12 127.407 localopt 0.063 iter13 127.898 localopt 0.031 iter14 127.768 localopt 0.047 iter15 129.823 localopt 0.031 iter16 125.803 localopt 0.047 iter17 129.823 localopt 0.063 iter18 128.901 localopt 0.047 iter19 126.105 localopt 0.047 iter20 128.164 localopt 0.032 Results with NLP solver IPOPT ---- 59 PARAMETER results obj modelstat time best iter1 126.113 localopt 0.344 126.113 iter2 128.129 localopt 0.266 iter3 126.113 localopt 0.313 iter4 128.103 localopt 0.218 iter5 128.901 localopt 0.266 iter6 125.803 localopt 0.344 125.803 iter7 129.330 localopt 0.375 iter8 128.816 localopt 0.360 iter9 128.103 localopt 0.406 iter10 130.276 localopt 0.390 iter11 127.255 localopt 0.328 iter12 126.113 localopt 0.281 iter13 129.002 localopt 0.297 iter14 128.149 localopt 0.235 iter15 129.823 localopt 0.266 iter16 126.545 localopt 0.328 iter17 127.094 localopt 0.204 iter18 128.164 localopt 0.344 iter19 128.201 localopt 0.313 iter20 128.723 localopt 0.281
Both solvers find a solution with an objective of 125.803. This is a small problem with \(8 \times 6 + 6 \times 12 = 120\) nonlinear variables. This looks doable for a global solver like Baron or Antigone. The problem is that we have \(8 \times 6\times 12 = 576\) bilinear terms \(a_{i,k}b_{k,j}\). This will make life really difficult for the global solvers. Let's try this, while using the 125.803 solution as an initial point and with a time limit of 1000 seconds.
Results with BARON Iteration Open nodes Time (s) Lower bound Upper bound 1 1 3.00 54.4268 125.803 * 4 3 8.00 54.4268 125.715 41 28 38.00 57.2903 125.715 76 54 69.00 58.6781 125.715 115 82 100.00 59.7597 125.715 155 111 131.00 60.2728 125.715 190 137 162.00 60.5990 125.715 229 164 192.00 60.8507 125.715 266 192 223.00 61.1276 125.715 301 216 254.00 61.3283 125.715 335 239 284.00 61.4689 125.715 373 265 315.00 61.7422 125.715 414 296 346.00 61.9213 125.715 454 321 377.00 62.1128 125.715 492 348 408.00 62.3822 125.715 528 370 438.00 62.4439 125.715 567 392 469.00 62.5986 125.715 598 412 499.00 62.6449 125.715 631 433 531.00 62.6994 125.715 664 458 561.00 62.7280 125.715 695 478 593.00 62.7932 125.715 728 501 624.00 62.8952 125.715 754 520 655.00 62.9707 125.715 782 539 685.00 63.0497 125.715 809 558 716.00 63.1053 125.715 834 577 746.00 63.1215 125.715 866 599 777.00 63.1805 125.715 894 620 808.00 63.1963 125.715 920 636 839.00 63.3036 125.715 949 655 869.00 63.3194 125.715 978 677 900.00 63.4053 125.715 1007 695 930.00 63.4625 125.715 1036 717 961.00 63.5189 125.715 1066 737 992.00 63.5608 125.715 1096 758 1023.00 63.6598 125.715 1122 778 1054.00 63.7217 125.715 1149 797 1085.00 63.8112 125.715 1160 804 1097.00 63.8213 125.715 *** Max. allowable time exceeded *** Results with Antigone ------------------------------------------------------------------------------- Time (s) Nodes explored Nodes remaining Best possible Best found Relative Gap ------------------------------------------------------------------------------- Searching for feasible solutions with 4 starting points at tree level 0 -- 25 1 1 +9.689e+01 +1.258e+02 +2.298e-01 Adding 0 total cutting planes at tree level 0 ---------------------------- Adding 739 total cutting planes at tree level 0 -------------------------- Strong Branching at tree level 1 ----------------------------------------- Strong Branching at tree level 1 ----------------------------------------- 561 1 2 +9.893e+01 +1.257e+02 +2.131e-01 Adding 0 total cutting planes at tree level 1 ---------------------------- Generating Edge-Concave Cuts; 0 generated so far ------------------------- Strong Branching at tree level 2 ----------------------------------------- Strong Branching at tree level 2 ----------------------------------------- 684 2 3 +9.893e+01 +1.257e+02 +2.131e-01 Adding 0 total cutting planes at tree level 1 ---------------------------- Strong Branching at tree level 2 ----------------------------------------- Strong Branching at tree level 2 ----------------------------------------- Strong Branching at tree level 2 ----------------------------------------- 804 3 4 +9.894e+01 +1.257e+02 +2.130e-01 Adding 0 total cutting planes at tree level 2 ---------------------------- Strong Branching at tree level 3 ----------------------------------------- Strong Branching at tree level 3 ----------------------------------------- 905 4 5 +9.894e+01 +1.257e+02 +2.130e-01 921 5 6 +9.897e+01 +1.257e+02 +2.128e-01 Adding 0 total cutting planes at tree level 3 ---------------------------- 962 6 7 +9.897e+01 +1.257e+02 +2.128e-01 970 7 8 +9.897e+01 +1.257e+02 +2.128e-01 979 8 9 +9.897e+01 +1.257e+02 +2.128e-01 986 9 10 +9.897e+01 +1.257e+02 +2.128e-01 993 10 11 +9.897e+01 +1.257e+02 +2.128e-01 Reached time limit of 1000 CPU s 1000 10 11 +9.897e+01 +1.257e+02 +2.128e-01
Both solvers have troubles closing the gap. Antigone seems to do better initially, but rest assured: it will slow down considerably and bounds are just not getting closer reasonably fast. But also: both solvers are able to improve the solution from 125.803 to 125.715. Interesting results.
We can make things less non-linear by considering absolute deviations:\[\begin{align}\min & \sum_{i,j} \left| \sum_k a_{i,k} b_{k,j} - x_{i,j}\right |\\ & 0 \le a_{i,k},b_{k,j} \le 1 \end{align}\] or \[\begin{align} \min\> & \sum_{i,j} y_{i,j} \\ & -y_{i,j} \le \sum_k a_{i,k} b_{k,j} - x_{i,j} \le y_{i,j} \\ & 0 \le a_{i,k},b_{k,j} \le 1 \\ & y_{i,j} \ge 0 \end{align}\] Note: when implementing this we should make sure not to duplicate the expression \(\sum_k a_{i,k} b_{k,j} - x_{i,j}\). That means: introduce another set of variables. An alternative least absolute deviations formulation uses variable splitting and can look like: \[\begin{align} \min\> & \sum_{i,j} \left ( y^+_{i,j}+y^-_{i,j}\right) \\ & y^+_{i,j}- y^-_{i,j} = \sum_k a_{i,k} b_{k,j} - x_{i,j}\\ & 0 \le a_{i,k},b_{k,j} \le 1 \\ & y^+_{i,j},y^-_{i,j}\ge 0\end{align} \] These least absolute deviations models do not buy us much w.r.t. performance: we are stuck with the bilinear forms \(a_{i,k} b_{k,j}\).
References
- Matrix factorization with constraints, https://stackoverflow.com/questions/51924180/matrix-factorization-with-constraints-r-matlab
No comments:
Post a Comment