NLP Model |
---|

\[\begin{align}\min &\sum_{i,j} \color{darkblue} c_{i,j} \sqrt{\color{darkred}x_{i,j}}\\ & \sum_j \color{darkred}x_{i,j} = 1 && \forall i \\& \sum_i \color{darkred}x_{i,j} = 1 && \forall j \\ & \color{darkred}x_{i,j} \in [0,1] \end{align} \] |

There are quite a few complications associated with this model:

- \(f(x)=\sqrt{x}\) is not defined for \(x\lt 0\),
- the derivative \(f'\) can only be evaluated for \(x\gt 0\) and it is very large for small \(x\),
- if there is a positive \(c_{i,j} \gt 0\), the objective function is non-convex.

Let's see what kind of problems and results we can encounter in practice.

#### Data

---- 19 PARAMETER c cost coefficients j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 i1 1.717 8.433 5.504 3.011 2.922 2.241 3.498 8.563 0.671 5.002 i2 9.981 5.787 9.911 7.623 1.307 6.397 1.595 2.501 6.689 4.354 i3 3.597 3.514 1.315 1.501 5.891 8.309 2.308 6.657 7.759 3.037 i4 1.105 5.024 1.602 8.725 2.651 2.858 5.940 7.227 6.282 4.638 i5 4.133 1.177 3.142 0.466 3.386 1.821 6.457 5.607 7.700 2.978 i6 6.611 7.558 6.274 2.839 0.864 1.025 6.413 5.453 0.315 7.924 i7 0.728 1.757 5.256 7.502 1.781 0.341 5.851 6.212 3.894 3.587 i8 2.430 2.464 1.305 9.334 3.799 7.834 3.000 1.255 7.489 0.692 i9 2.020 0.051 2.696 4.999 1.513 1.742 3.306 3.169 3.221 9.640 i10 9.936 3.699 3.729 7.720 3.967 9.131 1.196 7.355 0.554 5.763

#### Global optimal solution

We can derive the optimal solution of the original problem NLP. When we look at the plot of \(f(x)=\sqrt{x}\) against \(g(x)=x\), we see that there is no good reason to be between 0 and 1:

So actually we can solve the model as a linear assignment problem:

LP Model |
---|

\[\begin{align}\min &\sum_{i,j} \color{darkblue} c_{i,j} \color{darkred}x_{i,j}\\ & \sum_j \color{darkred}x_{i,j} = 1 && \forall i \\& \sum_i \color{darkred}x_{i,j} = 1 && \forall j \\ & \color{darkred}x_{i,j} \in [0,1] \end{align} \] |

When we solve this problem with our data set we see:

---- 32 **** LP solution **** VARIABLE z.L = 9.202 objective ---- 32 VARIABLE x.L j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 i1 1.000 i2 1.000 i3 1.000 i4 1.000 i5 1.000 i6 1.000 i7 1.000 i8 1.000 i9 1.000 i10 1.000

As is to be expected when solving a pure linear assignment problem, the solution is automatically integer-valued.

A conclusion is that we just converted a very simple LP model into a very difficult NLP problem with all kinds of complications.

#### Reformulation

Many NLP algorithms employ tolerances such that \(x\) values may be slightly outside their bounds. Also, values \(x_{i,j}=0\) may lead to problems when trying to form gradients. Indeed, when we feed the model as is to IPOPT we see:

S O L V E S U M M A R Y MODEL m OBJECTIVE z TYPE NLP DIRECTION MINIMIZE SOLVER IPOPT FROM LINE 28 **** SOLVER STATUS 4 Terminated By Solver **** MODEL STATUS 7 Feasible Solution **** OBJECTIVE VALUE 127.9104 RESOURCE USAGE, LIMIT 1.046 1000.000 ITERATION COUNT, LIMIT 162 2000000000 EVALUATION ERRORS 95 0 COIN-OR Ipopt 30.3.0 rc5da09e Released Mar 06, 2020 WEI x86 64bit/MS Window **** ERRORS/WARNINGS IN EQUATION obj 50 error(s): sqrt: FUNC DOMAIN: x < 0 1 warning(s): sqrt: GRAD SINGULAR: x tiny

In addition, the solver log is full of really scary messages:

Warning: Cutting back alpha due to evaluation error WARNING: Problem in step computation; switching to emergency mode. Restoration phase is called at point that is almost feasible, with constraint violation 4.626299e-11. Abort.

As IPOPT is an interior point solver, one may think that it only looks at points strictly inside \[ 0 \lt \ x_{i,j} \lt 1\] In this case, we should not see the above domain errors. However, IPOPT will widen the bounds first, so the feasible region formed by the bounds becomes something like \[ 0 -\delta \lt \ x_{i,j} \lt 1+\delta\]

I actually expected IPOPT to terminate earlier, as the default GAMS limit for domain errors is 0. Also, I don't understand the bookkeeping. We see that there are 95 evaluation errors. But when looking where they appear, we just have 50 evaluation errors in the sqrt function. This is probably a bug in the IPOPT GAMS link.

When we solve our reformulated NLP model with a global solver we see:

---- 37 **** Global NLP solution (Couenne, Antigone) **** VARIABLE z.L = 8.915 objective ---- 37 VARIABLE x.L j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 i1 1.000 i2 1.000 i3 1.000 i4 1.000 i5 1.000 i6 1.000 i7 1.000 i8 1.000 i9 1.000 i10 1.000

Our objective a little bit off (remember: we perturbed the objective a bit) but the solution is the same as we expected from the linear model.

Strangely the global solver Baron is producing a slightly worse solution:

---- 37 **** Global NLP solution (Baron) **** VARIABLE z.L = 13.205 objective ---- 37 VARIABLE x.L j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 i1 1.000 i2 1.000 i3 1.000 i4 1.000 i5 1.000 i6 1.000 i7 1.000 i8 1.000 i9 1.000 i10 1.000

It is always good to try different solvers!

#### Local solvers

As the square root functions make the problem non-convex, we can expect local solutions. The local NLP solver MINOS produces an interesting solution:

---- 37 **** Local NLP solution (Minos) **** VARIABLE z.L = 44.343 objective ---- 37 VARIABLE x.L j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 i1 1.000 i2 1.000 i3 1.000 i4 1.000 i5 1.000 i6 1.000 i7 1.000 i8 1.000 i9 1.000 i10 1.000

The solver IPOPT still has problems with this model:

---- 37 **** Local NLP solution (Ipopt) **** VARIABLE z.L = 52.233 objective ---- 37 VARIABLE x.L j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 i1 0.170 0.208 0.086 0.147 0.120 0.243 0.025 i2 0.003 0.228 0.266 0.323 0.179 i3 0.084 0.106 0.208 0.262 0.176 0.164 i4 0.238 0.059 0.242 0.154 0.171 0.135 i5 0.190 0.119 0.281 0.028 0.151 0.082 0.149 i6 0.222 0.184 0.197 0.137 0.259 i7 0.206 0.180 0.142 0.214 0.010 0.115 0.132 i8 0.108 0.120 0.176 0.112 0.267 0.216 i9 0.108 0.197 0.100 0.026 0.106 0.119 0.069 0.181 0.094 i10 0.086 0.144 0.155 0.071 0.257 0.289 EXIT: Restoration Failed! Final point is feasible: scaled constraint violation (3.33067e-16) is below tol (1e-08) and unscaled constraint violation (3.33067e-16) is below constr_viol_tol (0.0001).

I could fix this by making the tolerance \(\varepsilon\) larger, from 0.001 to 0.01:

---- 37 **** Local NLP solution (Ipopt, e=0.01) **** VARIABLE z.L = 8.327 objective ---- 37 VARIABLE x.L j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 i1 1.000 i2 1.000 i3 1.000 i4 1.000 i5 1.000 i6 1.000 i7 1.000 i8 1.000 i9 1.000 i10 1.000

Some solvers are very careful not to try to evaluate functions outside the bounds. With \(\varepsilon=0\), Conopt solves the model without issue (but to a location optimum), but mentions:

C O N O P T 3 version 3.17K Copyright (C) ARKI Consulting and Development A/S Bagsvaerdvej 246 A DK-2880 Bagsvaerd, Denmark The model has 101 variables and 21 constraints with 301 Jacobian elements, 100 of which are nonlinear. The Hessian of the Lagrangian has 100 elements on the diagonal, 0 elements below the diagonal, and 100 nonlinear variables. ** Warning ** The variance of the derivatives in the initial point is large (= 14. ). A better initial point, a better scaling, or better bounds on the variables will probably help the optimization.

Interestingly Conopt is warning us about the variance of the derivatives. I would have expected it to complain about the size of the gradients. It is noted that the default initial point in GAMS is zero. To inspect the initial gradients we can look at the GAMS equation listing;

---- obj1 =E= obj1.. - (17174713200)*x(i1,j1) - (84326670800)*x(i1,j2) - (55037535600)*x(i1,j3) - (30113790400)*x(i1,j4) - (29221211700)*x(i1,j5) - (22405286700)*x(i1,j6) - (34983050400)*x(i1,j7) - (85627034700)*x(i1,j8) - (6711372300)*x(i1,j9) - (50021066900)*x(i1,j10) - (99811762700)*x(i2,j1) - (57873337800)*x(i2,j2) - (99113303900)*x(i2,j3) - (76225046700)*x(i2,j4) - (13069248300)*x(i2,j5) - (63971875900)*x(i2,j6) - (15951786400)*x(i2,j7) - (25008053300)*x(i2,j8) - (66892860900)*x(i2,j9) - (43535638100)*x(i2,j10) - (35970026600)*x(i3,j1) - (35144136800)*x(i3,j2) - (13149159000)*x(i3,j3) - (15010178800)*x(i3,j4) - (58911365000)*x(i3,j5) - (83089281200)*x(i3,j6) - (23081573800)*x(i3,j7) - (66573446000)*x(i3,j8) - (77585760600)*x(i3,j9) - (30365847700)*x(i3,j10) - (11049229100)*x(i4,j1) - (50238486600)*x(i4,j2) - (16017276200)*x(i4,j3) - (87246231100)*x(i4,j4) - (26511454500)*x(i4,j5) - (28581432200)*x(i4,j6) - (59395592200)*x(i4,j7) - (72271907100)*x(i4,j8) - (62824867700)*x(i4,j9) - (46379786500)*x(i4,j10) - (41330699400)*x(i5,j1) - (11769535700)*x(i5,j2) - (31421226700)*x(i5,j3) - (4655151400)*x(i5,j4) - (33855027200)*x(i5,j5) - (18209959300)*x(i5,j6) - (64572712700)*x(i5,j7) - (56074554700)*x(i5,j8) - (76996172000)*x(i5,j9) - (29780586400)*x(i5,j10) - (66110626100)*x(i6,j1) - (75582167400)*x(i6,j2) - (62744749900)*x(i6,j3) - (28386419800)*x(i6,j4) - (8642462400)*x(i6,j5) - (10251466900)*x(i6,j6) - (64125115100)*x(i6,j7) - (54530949800)*x(i6,j8) - (3152485200)*x(i6,j9) - (79236064200)*x(i6,j10) - (7276699800)*x(i7,j1) - (17566104900)*x(i7,j2) - (52563261300)*x(i7,j3) - (75020766900)*x(i7,j4) - (17812371400)*x(i7,j5) - (3414098600)*x(i7,j6) - (58513117300)*x(i7,j7) - (62122998400)*x(i7,j8) - (38936190000)*x(i7,j9) - (35871415300)*x(i7,j10) - (24303461700)*x(i8,j1) - (24642153900)*x(i8,j2) - (13050280300)*x(i8,j3) - (93344972000)*x(i8,j4) - (37993790600)*x(i8,j5) - (78340046100)*x(i8,j6) - (30003425800)*x(i8,j7) - (12548322200)*x(i8,j8) - (74887410500)*x(i8,j9) - (6923246300)*x(i8,j10) - (20201555700)*x(i9,j1) - (506585800)*x(i9,j2) - (26961305200)*x(i9,j3) - (49985147500)*x(i9,j4) - (15128586900)*x(i9,j5) - (17416945500)*x(i9,j6) - (33063773400)*x(i9,j7) - (31690605400)*x(i9,j8) - (32208695500)*x(i9,j9) - (96397664100)*x(i9,j10) - (99360220500)*x(i10,j1) - (36990305500)*x(i10,j2) - (37288856700)*x(i10,j3) - (77197833000)*x(i10,j4) - (39668414200)*x(i10,j5) - (91309632500)*x(i10,j6) - (11957773000)*x(i10,j7) - (73547888900)*x(i10,j8) - (5541847500)*x(i10,j9) - (57629980500)*x(i10,j10) + z =E= 0 ; (LHS = 0)

**linearized objective**. The numbers in parentheses are gradients. We see that they are very large (this is how GAMS returns the gradient at zero; instead of returning infinity a large number is returned). It also shows the variance Conopt is talking about.

#### Conclusion

Just by making the assignment problem a little bit nonlinear, we can get into major problems.