Wednesday, August 5, 2020

Concatenate numbers is not easy in a MIP model

From [1]: 

Here is the problem: 
  • There are 4 integer variables from within a limited set = [1, 10, 20, 40, 100, 200]
  • When all 4 is concatenated (say 1 || 1 || 10 || 20 = 111020) the result can be divisible by 13. 
I tried Python's pyomo library but can not find a solution as there is no modulus operation to decide if the number can be divisible by 13.

There is no objective in this problem, so let's invent a simple one: find the smallest concatenated number that is divisible by 13. 

Discussion


There are two issues here:
  • Requiring a number is divisible by 13. This is easy. Just use an integer variable \(\mathit{mul13}\in \{0,1,2,3,\dots\}\) and impose the constraint \[ x = 13 \cdot \mathit{mul13}\] This will assure that \(x\) is a multiple of 13.
  • Concatenate numbers as if they are strings. This is not so easy. MIP solvers do not know anything about strings. So we have to find a work-around. 

Let's work from the back. 10 || 20 can be interpreted as: \[20 + 10^{\mathit{numDigits}} \cdot 10\] where \(\mathit{numDigits}\) is the number of digits of 20 (i.e., \(\mathit{numDigits}=2\)). So, the 10 is actually worth 1000. With this idea, we can setup the recursion. Let's call each substring a "piece", indexed by \(p\). So, we have: \[\begin{align} & \mathit{cumDigits}_p = \begin{cases} \mathit{cumDigits}_{p+1} + \mathit{numDigits}_p & p \in \{1,2,3,4\} \\ 0 & p \gt 4 \end{cases}\\ & \mathit{cumValue}_p = \begin{cases} \mathit{cumValue}_{p+1} + 10^{\mathit{cumDigits}_{p+1}} \cdot \mathit{pieceValue}_p  &p \in \{1,2,3,4\} \\ 0 &p \gt 4 \end{cases} \end{align}\]  where \[\begin{align} & \mathit{numDigits}_p  &&\text{number of digits in $p$-th piece} \\  & \mathit{cumDigits}_p && \text{cumulative number of digits in pieces $p,p+1,\dots$}\\ &\mathit{pieceValue}_p && \text{value of piece $p$}\\ & \mathit{cumValue}_p && \text{cumulative value of pieces $p,p+1,\dots$} \end{align}\] 

The values of  \( \mathit{numDigits}_p\) and \(\mathit{value}_p \) depend on what value we have chosen for piece \(p\). 

Obviously, this is quite convoluted. Furthermore, recursion for the cumulative value calculation is non-linear. Let's see if we can put this together in an MINLP model.

Data


The data is organized as follows: 

----     14 SET p  pieces

piece1,    piece2,    piece3,    piece4


----     14 SET s  selection from values

select1,    select2,    select3,    select4,    select5,    select6


----     14 PARAMETER value  selectable values

select1   1,    select2  10,    select3  20,    select4  40,    select5 100,    select6 200


----     14 PARAMETER digit  number of digits

select1 1,    select2 2,    select3 2,    select4 2,    select5 3,    select6 3


The number of digits for each value is calculated as: \[\mathit{digit}_p = \Bigl \lfloor{ 1 + \log_{10} \mathit{value}_p }\Bigr \rfloor \] where the funny brackets indicate the floor function. 
  

Mathematical Model


A complete model can look like:

MINLP Model
\[ \begin{align} \min\>& \color{darkred} z \\ & \sum_s \color{darkred}{\mathit{select}}_{p,s} = 1 && \forall p && \text{select exactly one value for each piece $p$}&& \text{(linear)}\\ & \color{darkred}{\mathit{pieceValue}}_p = \sum_s \color{darkblue}{\mathit{value}}_s \cdot \color{darkred}{\mathit{select}}_{p,s} && \forall p && \text{selected value} && \text{(linear)}\\ & \color{darkred}{\mathit{numDigits}}_p = \sum_s \color{darkblue}{\mathit{digit}}_s \cdot \color{darkred}{\mathit{select}}_{p,s} && \forall p && \text{associated number of digits}&& \text{(linear)}\\ & \color{darkred}{\mathit{cumDigits}}_p = \color{darkred}{\mathit{cumDigits}}_{p+1} + \color{darkred}{\mathit{numDigits}}_p && \forall p && \text{cumulative number of digits}&& \text{(linear)}\\ & \color{darkred}{\mathit{cumValue}}_p = \color{darkred}{\mathit{cumValue}}_{p+1} + 10^{\color{darkred}{\mathit{cumDigits}}_{p+1}} \cdot \color{darkred}{\mathit{pieceValue}}_p && \forall p &&\text{cumulative value} && \text{(non-linear)} \\ & \color{darkred} z = \color{darkred}{\mathit{cumValue}}_{\color{darkgreen}{\mathit{piece1}}} && && \text{objective variable} && \text{(linear)}\\ & \color{darkred} z = 13 \cdot \color{darkred}{\mathit{mul13}} && && \text{must be multiple of 13} && \text{(linear)}\\ & \color{darkred}{\mathit{select}}_{p,s} \in \{0,1\} \\ & \color{darkred}{\mathit{pieceValue}}_p \in \{0,1,2,\dots\} \\ & \color{darkred}{\mathit{numDigits}}_p \in \{0,1,2,\dots\} \\ & \color{darkred}{\mathit{cumDigits}}_p \in \{0,1,2,\dots\} \\ & \color{darkred}{\mathit{cumValue}}_p \in \{0,1,2,\dots\} \\ & \color{darkred}{\mathit{mul13}} \in \{0,1,2,\dots\} \end{align}\]


In the above, we assume that  \({\mathit{cumValue}}_{p+1}\) and \({\mathit{cumDigits}}_{p+1}\) have a value of zero when we go beyond the last element. This is automatic in GAMS, so we can follow the above model quite straightforwardly.

Results 


Lo and behold, Baron can solve the model very quickly. The only things I added are reasonable upper bounds on the integer variables and a setting OPTCR=0 in order to search for proven global solutions. The results look like:


----     61 VARIABLE select.L  selection of value

           select1     select2

piece1           1
piece2                       1
piece3           1
piece4           1


----     61 VARIABLE pieceValue.L  value of single piece

piece1  1,    piece2 10,    piece3  1,    piece4  1


----     61 VARIABLE numDigits.L  number of digits of single piece

piece1 1,    piece2 2,    piece3 1,    piece4 1


----     61 VARIABLE cumValue.L  cumulative value

piece1 11011,    piece2  1011,    piece3    11,    piece4     1


----     61 VARIABLE cumDigits.L  cumulative number of digits

piece1 5,    piece2 4,    piece3 2,    piece4 1


----     61 VARIABLE Mul13.L               =          847  multiple of 13
            VARIABLE z.L                   =        11011  objective
 

Complete enumeration


To verify the solution we can use a complete enumeration approach. The total number of cases is \(6^4=1296\). This is not too large. Below is a table that illustrates the enumeration. Only the first 50 records are shown. The column div13 has a one where the concatenated number is a multiple of 13. The column smallest marks the best row.


----     66 PARAMETER data  complete enumeration

           piece1      piece2      piece3      piece4      concat       div13    smallest

k1              1           1           1           1        1111
k2              1           1           1          10       11110
k3              1           1           1          20       11120
k4              1           1           1          40       11140
k5              1           1           1         100      111100
k6              1           1           1         200      111200
k7              1           1          10           1       11101
k8              1           1          10          10      111010
k9              1           1          10          20      111020           1
k10             1           1          10          40      111040
k11             1           1          10         100     1110100
k12             1           1          10         200     1110200           1
k13             1           1          20           1       11201
k14             1           1          20          10      112010
k15             1           1          20          20      112020
k16             1           1          20          40      112040
k17             1           1          20         100     1120100
k18             1           1          20         200     1120200
k19             1           1          40           1       11401           1
k20             1           1          40          10      114010           1
k21             1           1          40          20      114020
k22             1           1          40          40      114040
k23             1           1          40         100     1140100           1
k24             1           1          40         200     1140200
k25             1           1         100           1      111001
k26             1           1         100          10     1110010
k27             1           1         100          20     1110020
k28             1           1         100          40     1110040
k29             1           1         100         100    11100100
k30             1           1         100         200    11100200
k31             1           1         200           1      112001
k32             1           1         200          10     1120010
k33             1           1         200          20     1120020
k34             1           1         200          40     1120040
k35             1           1         200         100    11200100
k36             1           1         200         200    11200200
k37             1          10           1           1       11011           1           1
k38             1          10           1          10      110110           1
k39             1          10           1          20      110120
k40             1          10           1          40      110140
k41             1          10           1         100     1101100           1
k42             1          10           1         200     1101200
k43             1          10          10           1      110101
k44             1          10          10          10     1101010
k45             1          10          10          20     1101020
k46             1          10          10          40     1101040
k47             1          10          10         100    11010100
k48             1          10          10         200    11010200
k49             1          10          20           1      110201           1
k50             1          10          20          10     1102010           1
 

This confirms the solution we found using Baron.


Conclusion


The "string concatenation" problem in the original problem statement is not very easy to model in an MINLP model. We made a heroic effort, and it seems to work for this example. Baron was able to find the optimal solution. However, due to the large numbers involved, I doubt this approach can be used for larger problems. MINLP and MIP algorithms are not very suited for large integer variables. 
 

References


  1. Could this problem be solved using mixed integer non-linear programming or any other optimization technique?, https://stackoverflow.com/questions/63136355/could-this-problem-be-solved-using-mixed-integer-non-linear-programming-or-any-o

Thursday, July 30, 2020

Seemingly simple but tricky NLP

People, mostly students, send me a lot of models for me to debug. Usually, I put them aside. A day has only 24 hours. Besides, students should discuss their problems with their teacher or supervisor instead of with me. But, here is one that is a bit more interesting. I reduced the model to its essence:



NLP Model
\[\begin{align}\min &\sum_{i,j} \color{darkblue} c_{i,j} \sqrt{\color{darkred}x_{i,j}}\\ & \sum_j \color{darkred}x_{i,j} = 1 && \forall i \\& \sum_i \color{darkred}x_{i,j} = 1 && \forall j \\ & \color{darkred}x_{i,j} \in [0,1] \end{align} \]

There are quite a few complications associated with this model:
  • \(f(x)=\sqrt{x}\) is not defined for \(x\lt 0\),
  • the derivative \(f'\) can only be evaluated for \(x\gt 0\) and it is very large for small \(x\),
  • if there is a positive \(c_{i,j} \gt 0\), the objective function is non-convex.

Let's see what kind of problems and results we can encounter in practice. 

Data


The random data set I used looks like:

----     19 PARAMETER c  cost coefficients

             j1          j2          j3          j4          j5          j6          j7          j8          j9         j10

i1        1.717       8.433       5.504       3.011       2.922       2.241       3.498       8.563       0.671       5.002
i2        9.981       5.787       9.911       7.623       1.307       6.397       1.595       2.501       6.689       4.354
i3        3.597       3.514       1.315       1.501       5.891       8.309       2.308       6.657       7.759       3.037
i4        1.105       5.024       1.602       8.725       2.651       2.858       5.940       7.227       6.282       4.638
i5        4.133       1.177       3.142       0.466       3.386       1.821       6.457       5.607       7.700       2.978
i6        6.611       7.558       6.274       2.839       0.864       1.025       6.413       5.453       0.315       7.924
i7        0.728       1.757       5.256       7.502       1.781       0.341       5.851       6.212       3.894       3.587
i8        2.430       2.464       1.305       9.334       3.799       7.834       3.000       1.255       7.489       0.692
i9        2.020       0.051       2.696       4.999       1.513       1.742       3.306       3.169       3.221       9.640
i10       9.936       3.699       3.729       7.720       3.967       9.131       1.196       7.355       0.554       5.763


Global optimal solution


We can derive the optimal solution of the original problem NLP. When we look at the plot of \(f(x)=\sqrt{x}\) against \(g(x)=x\), we see that there is no good reason to be between 0 and 1:


So actually we can solve the model as a linear assignment problem:


LP Model
\[\begin{align}\min &\sum_{i,j} \color{darkblue} c_{i,j} \color{darkred}x_{i,j}\\ & \sum_j \color{darkred}x_{i,j} = 1 && \forall i \\& \sum_i \color{darkred}x_{i,j} = 1 && \forall j \\ & \color{darkred}x_{i,j} \in [0,1] \end{align} \]

When we solve this problem with our data set we see:

----     32 **** LP solution ****
            VARIABLE z.L                   =        9.202  objective

----     32 VARIABLE x.L  

             j1          j2          j3          j4          j5          j6          j7          j8          j9         j10

i1                                                                                                        1.000
i2                                                                                            1.000
i3                                1.000
i4        1.000
i5                                            1.000
i6                                                        1.000
i7                                                                    1.000
i8                                                                                                                    1.000
i9                    1.000
i10                                                                               1.000


As is to be expected when solving a pure linear assignment problem, the solution is automatically integer-valued. 

A conclusion is that we just converted a very simple LP model into a very difficult NLP problem with all kinds of complications.

Reformulation


Many NLP algorithms employ tolerances such that \(x\) values may be slightly outside their bounds. Also, values \(x_{i,j}=0\) may lead to problems when trying to form gradients. Indeed, when we feed the model as is to IPOPT we see:


               S O L V E      S U M M A R Y

     MODEL   m                   OBJECTIVE  z
     TYPE    NLP                 DIRECTION  MINIMIZE
     SOLVER  IPOPT               FROM LINE  28

**** SOLVER STATUS     4 Terminated By Solver      
**** MODEL STATUS      7 Feasible Solution         
**** OBJECTIVE VALUE              127.9104

 RESOURCE USAGE, LIMIT          1.046      1000.000
 ITERATION COUNT, LIMIT       162    2000000000
 EVALUATION ERRORS             95             0

COIN-OR Ipopt    30.3.0 rc5da09e Released Mar 06, 2020 WEI x86 64bit/MS Window

**** ERRORS/WARNINGS IN EQUATION obj
     50 error(s): sqrt: FUNC DOMAIN: x < 0
     1 warning(s): sqrt: GRAD SINGULAR: x tiny

In addition, the solver log is full of really scary messages:

Warning: Cutting back alpha due to evaluation error
WARNING: Problem in step computation; switching to emergency mode.
Restoration phase is called at point that is almost feasible,
  with constraint violation 4.626299e-11. Abort.

As IPOPT is an interior point solver, one may think that it only looks at points strictly inside \[ 0 \lt \ x_{i,j} \lt 1\] In this case, we should not see the above domain errors. However, IPOPT will widen the bounds first, so the feasible region formed by the bounds becomes something like \[ 0 -\delta \lt \ x_{i,j} \lt 1+\delta\] 

I actually expected IPOPT to terminate earlier, as the default GAMS limit for domain errors is 0. Also, I don't understand the bookkeeping. We see that there are 95 evaluation errors. But when looking where they appear, we just have 50 evaluation errors in the sqrt function. This is probably a bug in the IPOPT GAMS link. 

One approach is to use a small non-zero lower bound on all variables: \[x_{i,j} \in [\varepsilon,1]\] The main disadvantage is that we exclude a solution with \(x_{i,j}=0\). A different way to handle this it to change the square root function a bit: \(\sqrt{\varepsilon+x_{i,j}}\). In the experiments below I use a slightly improved version \[\sqrt{\varepsilon+x_{i,j}}-\sqrt{\varepsilon}\] with \(\varepsilon=0.001\).  

When we solve our reformulated NLP model with a global solver we see:

----     37 **** Global NLP solution (Couenne, Antigone) ****
            VARIABLE z.L                   =        8.915  objective

----     37 VARIABLE x.L  

             j1          j2          j3          j4          j5          j6          j7          j8          j9         j10

i1                                                                                                        1.000
i2                                                                                            1.000
i3                                1.000
i4        1.000
i5                                            1.000
i6                                                        1.000
i7                                                                    1.000
i8                                                                                                                    1.000
i9                    1.000
i10                                                                               1.000


Our objective a little bit off (remember: we perturbed the objective a bit) but the solution is the same as we expected from the linear model.

Strangely the global solver Baron is producing a slightly worse solution:


----     37 **** Global NLP solution (Baron) ****
            VARIABLE z.L                   =       13.205  objective

----     37 VARIABLE x.L  

             j1          j2          j3          j4          j5          j6          j7          j8          j9         j10

i1                                            1.000
i2                                                        1.000
i3                                1.000
i4        1.000
i5                    1.000
i6                                                                                                        1.000
i7                                                                    1.000
i8                                                                                                                    1.000
i9                                                                                            1.000
i10                                                                               1.000

It is always good to try different solvers!

Local solvers


As the square root functions make the problem non-convex, we can expect local solutions. The local NLP solver MINOS produces an interesting solution:


----     37 **** Local NLP solution (Minos) ****
            VARIABLE z.L                   =       44.343  objective

----     37 VARIABLE x.L  

             j1          j2          j3          j4          j5          j6          j7          j8          j9         j10

i1                                                                                                                    1.000
i2                                                                                                        1.000
i3                                                                                            1.000
i4                                                                                1.000
i5                                                                    1.000
i6                                                        1.000
i7                                            1.000
i8                                1.000
i9                    1.000
i10       1.000


The solver IPOPT still has problems with this model:


----     37 **** Local NLP solution (Ipopt) ****
            VARIABLE z.L                   =       52.233  objective

----     37 VARIABLE x.L  

             j1          j2          j3          j4          j5          j6          j7          j8          j9         j10

i1        0.170                               0.208       0.086       0.147       0.120                   0.243       0.025
i2                    0.003                               0.228                   0.266       0.323                   0.179
i3        0.084       0.106       0.208       0.262                               0.176                               0.164
i4        0.238       0.059       0.242                   0.154       0.171                                           0.135
i5                    0.190       0.119       0.281       0.028       0.151                   0.082                   0.149
i6                                            0.222       0.184       0.197                   0.137       0.259
i7        0.206       0.180                               0.142       0.214                   0.010       0.115       0.132
i8        0.108       0.120       0.176                                           0.112       0.267                   0.216
i9        0.108       0.197       0.100       0.026       0.106       0.119       0.069       0.181       0.094
i10       0.086       0.144       0.155                   0.071                   0.257                   0.289

EXIT: Restoration Failed!
Final point is feasible: scaled constraint violation (3.33067e-16) is below tol (1e-08) and unscaled constraint violation (3.33067e-16) is below constr_viol_tol (0.0001).

I could fix this by making the tolerance \(\varepsilon\) larger, from 0.001 to 0.01:


----     37 **** Local NLP solution (Ipopt, e=0.01) ****
            VARIABLE z.L                   =        8.327  objective

----     37 VARIABLE x.L  

             j1          j2          j3          j4          j5          j6          j7          j8          j9         j10

i1                                                                                                        1.000
i2                                                                                            1.000
i3                                1.000
i4        1.000
i5                                            1.000
i6                                                        1.000
i7                                                                    1.000
i8                                                                                                                    1.000
i9                    1.000
i10                                                                               1.000


Some solvers are very careful not to try to evaluate functions outside the bounds. With \(\varepsilon=0\), Conopt solves the model without issue (but to a location optimum), but mentions:

    C O N O P T 3   version 3.17K
    Copyright (C)   ARKI Consulting and Development A/S
                    Bagsvaerdvej 246 A
                    DK-2880 Bagsvaerd, Denmark
 
 
    The model has 101 variables and 21 constraints
    with 301 Jacobian elements, 100 of which are nonlinear.
    The Hessian of the Lagrangian has 100 elements on the diagonal,
    0 elements below the diagonal, and 100 nonlinear variables.
 
 ** Warning **  The variance of the derivatives in the initial
                point is large (= 14. ). A better initial
                point, a better scaling, or better bounds on the
                variables will probably help the optimization.

Interestingly Conopt is warning us about the variance of the derivatives. I would have expected it to complain about the size of the gradients. It is noted that the default initial point in GAMS is zero. To inspect the initial gradients we can look at the GAMS equation listing;


---- obj1  =E=  

obj1..  - (17174713200)*x(i1,j1) - (84326670800)*x(i1,j2) - (55037535600)*x(i1,j3) - (30113790400)*x(i1,j4)
     
      - (29221211700)*x(i1,j5) - (22405286700)*x(i1,j6) - (34983050400)*x(i1,j7) - (85627034700)*x(i1,j8) - (6711372300)*x(i1,j9)
     
      - (50021066900)*x(i1,j10) - (99811762700)*x(i2,j1) - (57873337800)*x(i2,j2) - (99113303900)*x(i2,j3)
     
      - (76225046700)*x(i2,j4) - (13069248300)*x(i2,j5) - (63971875900)*x(i2,j6) - (15951786400)*x(i2,j7) - (25008053300)*x(i2,j8)
     
      - (66892860900)*x(i2,j9) - (43535638100)*x(i2,j10) - (35970026600)*x(i3,j1) - (35144136800)*x(i3,j2)
     
      - (13149159000)*x(i3,j3) - (15010178800)*x(i3,j4) - (58911365000)*x(i3,j5) - (83089281200)*x(i3,j6) - (23081573800)*x(i3,j7)
     
      - (66573446000)*x(i3,j8) - (77585760600)*x(i3,j9) - (30365847700)*x(i3,j10) - (11049229100)*x(i4,j1)
     
      - (50238486600)*x(i4,j2) - (16017276200)*x(i4,j3) - (87246231100)*x(i4,j4) - (26511454500)*x(i4,j5) - (28581432200)*x(i4,j6)
     
      - (59395592200)*x(i4,j7) - (72271907100)*x(i4,j8) - (62824867700)*x(i4,j9) - (46379786500)*x(i4,j10)
     
      - (41330699400)*x(i5,j1) - (11769535700)*x(i5,j2) - (31421226700)*x(i5,j3) - (4655151400)*x(i5,j4) - (33855027200)*x(i5,j5)
     
      - (18209959300)*x(i5,j6) - (64572712700)*x(i5,j7) - (56074554700)*x(i5,j8) - (76996172000)*x(i5,j9)
     
      - (29780586400)*x(i5,j10) - (66110626100)*x(i6,j1) - (75582167400)*x(i6,j2) - (62744749900)*x(i6,j3)
     
      - (28386419800)*x(i6,j4) - (8642462400)*x(i6,j5) - (10251466900)*x(i6,j6) - (64125115100)*x(i6,j7) - (54530949800)*x(i6,j8)
     
      - (3152485200)*x(i6,j9) - (79236064200)*x(i6,j10) - (7276699800)*x(i7,j1) - (17566104900)*x(i7,j2) - (52563261300)*x(i7,j3)
     
      - (75020766900)*x(i7,j4) - (17812371400)*x(i7,j5) - (3414098600)*x(i7,j6) - (58513117300)*x(i7,j7) - (62122998400)*x(i7,j8)
     
      - (38936190000)*x(i7,j9) - (35871415300)*x(i7,j10) - (24303461700)*x(i8,j1) - (24642153900)*x(i8,j2)
     
      - (13050280300)*x(i8,j3) - (93344972000)*x(i8,j4) - (37993790600)*x(i8,j5) - (78340046100)*x(i8,j6) - (30003425800)*x(i8,j7)
     
      - (12548322200)*x(i8,j8) - (74887410500)*x(i8,j9) - (6923246300)*x(i8,j10) - (20201555700)*x(i9,j1) - (506585800)*x(i9,j2)
     
      - (26961305200)*x(i9,j3) - (49985147500)*x(i9,j4) - (15128586900)*x(i9,j5) - (17416945500)*x(i9,j6) - (33063773400)*x(i9,j7)
     
      - (31690605400)*x(i9,j8) - (32208695500)*x(i9,j9) - (96397664100)*x(i9,j10) - (99360220500)*x(i10,j1)
     
      - (36990305500)*x(i10,j2) - (37288856700)*x(i10,j3) - (77197833000)*x(i10,j4) - (39668414200)*x(i10,j5)
     
      - (91309632500)*x(i10,j6) - (11957773000)*x(i10,j7) - (73547888900)*x(i10,j8) - (5541847500)*x(i10,j9)
     
      - (57629980500)*x(i10,j10) + z =E= 0 ; (LHS = 0)
  

This shows the linearized objective. The numbers in parentheses are gradients. We see that they are very large (this is how GAMS returns the gradient at zero; instead of returning infinity a large number is returned). It also shows the variance Conopt is talking about.

Conclusion


Just by making the assignment problem a little bit nonlinear, we can get into major problems.