One way to categorize (local) nonlinear programming (NLP) solvers is active set methods and interior point solvers. Some representative large-scale sparse solvers are:
- Active set: CONOPT, SNOPT. These are using SQP algorithms.
- Interior point: IPOPT, Knitro. Note: Knitro also contains an active set algorithm.
These are two very different types of algorithms. They work very well on different types of problems. If you do a lot of nonlinear optimization, it actually makes sense to have both types of solvers available. With modeling tools like AMPL and GAMS, it is very easy to switch between solvers. Testing performance between two solvers is then also a snap.
Here I want to discuss some factors that can help in deciding whether to use an active set solver or an interior point solver.
Clean solutions
Interior point codes will return an interior solution, i.e., a solution with variable levels that are strictly between their bounds. Sometimes this makes solutions a bit "ugly". Here is an example of a solution with IPOPT:
---- 128 VARIABLE x.L shipments along arcs dst1 dst2 dst3 dst4 dst5 dst6 dst7 dst8 dst9 dst10 src1 -9.9637E-11 -9.9679E-11 379.812 33.417 -9.9767E-11 -9.9838E-11 -9.9714E-11 -9.0291E-11 -9.9895E-11 7.94230E-11 src2 995.348 -9.9769E-11 1033.580 -9.9854E-11 -9.8755E-11 -9.8043E-11 -9.9853E-11 -9.9796E-11 -9.9871E-11 -9.9235E-11 src3 -9.9768E-11 -9.9454E-11 -9.9647E-11 1324.222 -9.9648E-11 -9.8739E-11 -9.9884E-11 -9.9783E-11 -9.9905E-11 -9.8972E-11 src4 -9.9852E-11 -9.9671E-11 -9.9831E-11 -9.9573E-11 -9.7041E-11 724.548 -9.9883E-11 -9.9775E-11 -9.9685E-11 -9.9829E-11 src5 632.127 -9.9319E-11 -9.9749E-11 -9.8960E-11 -9.6832E-11 70.945 -9.9859E-11 -9.9766E-11 -9.9827E-11 -9.8438E-11 src6 -9.9744E-11 -9.9772E-11 -9.9566E-11 -9.9901E-11 -9.9484E-11 -9.9876E-11 -9.9846E-11 -9.9369E-11 -9.9914E-11 539.079 src7 -8.8389E-11 497.790 0.009 -9.7318E-11 -9.6306E-11 343.906 -9.9603E-11 -9.9322E-11 -9.9813E-11 -9.9854E-11 src8 -9.9849E-11 -9.8903E-11 351.900 -9.9735E-11 232.776 -9.9783E-11 284.116 -9.9657E-11 1191.424 -9.6824E-11 src9 150.265 11.213 -9.5633E-11 -9.9748E-11 -9.9800E-11 -9.8005E-11 -9.9816E-11 -9.9057E-11 -9.9726E-11 -9.9848E-11 src10 -9.9584E-11 521.776 -9.9354E-11 -9.9200E-11 -9.3240E-11 -9.9843E-11 -9.8713E-11 445.417 -9.9378E-11 236.332
---- 130 VARIABLE x.L shipments along arcs dst1 dst2 dst3 dst4 dst5 dst6 dst7 dst8 dst9 dst10 src1 379.812 33.417 src2 995.348 1033.580 src3 1324.222 src4 724.548 src5 632.127 70.945 src6 539.079 src7 497.790 0.009 343.906 src8 351.900 232.776 284.116 1191.424 src9 150.265 11.213 src10 521.776 445.417 236.332
Besides this, the duals are also somewhat different.
Because of this behavior, I sometimes use CONOPT to "clean up" a solution that was found by IPOPT.
Exploiting a good initial point
One major difference between the behavior of active set and interior point NLP solvers is that active set solvers are often much better at restarting from a good solution. Let's delve in a bit.
In this experiment, I solve a fairly large convex, linearly constrained NLP problem from scratch and then resolve it using the optimal point as starting point. This gives an idea of how much time we can save using an optimal (or near optimal) initial point. Here we try this with different solver combinations:
---- 126 PARAMETER report solve report INDEX 1 = A obj time total time iter status baserun.CONOPT 467317.782 11.781 1750.000 localopt restart.CONOPT 467317.782 0.031 11.812 4.000 localopt INDEX 1 = B obj time total time iter status baserun.IPOPT 467317.782 12.926 43.000 localopt restart.IPOPT 467317.782 5.627 18.553 15.000 localopt INDEX 1 = C obj time total time iter status baserun.IPOPTH 467317.782 1.693 44.000 localopt restart.IPOPTH 467317.782 0.684 2.377 14.000 localopt INDEX 1 = D obj time total time iter status baserun.IPOPTH 467317.782 1.725 44.000 localopt restart.CONOPT 467317.782 0.047 1.772 11.000 localopt
Let's discuss this a bit:
- Case A. Here we see that CONOPT is doing a great job in the restart. It basically needs very little time to recover the optimal solution. This is typical behavior for simplex-based LP solvers and active set NLP solvers.
- Case B. The restart for IPOPT is not as great. We need almost half the solution time when we pass on the optimal solution as starting point. This is normal for interior point algorithms.
- Case C illustrates that IPOPTH is just a much faster version of IPOPT. IPOPTH enjoys a better linear algebra library (it uses Harwell's MA27) than the standard IPOPT (which uses MUMPS). This solves the base run very fast. The restart is still not great.
- Case D. With these results, it seems to make sense to use IPOPTH to do the base run and CONOPT for restarting from (near) optimal points. Indeed this is the fastest combination. Note that CONOPT is doing a great job here, as IPOPTH returns an interior point. Effectively in this experiment, CONOPT is performing a crossover step.
For just one restart, this may not be so exciting. But if you have to solve a larger number of related scenarios, this approach is quite appealing: use an interior point code to solve the base run, and an active set method to solve the restarts.
Large number of superbasics
Interior point algorithms (for LPs and NLPs), are often performing well on very large and sparse problems. Another reason to use an interior point method instead of an active set method is slowdown because we have a large number of superbasic variables. In linear programming, we are familiar with splitting the variables (including slacks) into two sets: basic variables and non-basic variables. Non-basic variables are temporarily fixed at one of their bounds. An active set NLP solver uses a similar idea. But it adds the concept of superbasic variables: variables that are non-basic but between their bounds. Active set solvers usually like it when the number of superbasics is not too large. Interior point algorithms don't really care. You could even say that for an IP algorithm, all variables are superbasic as they live in the strict interior of the bounds.
In [1] a large convex linearly constrained problem is solved using different solvers. These models have a large number of superbasics (both during the optimization and in the final solution).
Here we see a small piece of the solver log using CONOPT:
Iter Phase Ninf Objective RGmax NSB Step InItr MX OK 41 3 4.1213459188E+05 2.3E+03495694 5.2E-01 15 F T 42 3 4.1078726901E+05 1.7E+03495694 4.1E-01 14 F T 43 3 4.1000483390E+05 1.5E+03495694 1.5E-07 T T 44 3 4.0958894025E+05 1.9E+03495693 6.0E-08 F T 45 3 4.0922672671E+05 1.5E+03495693 7.7E-08 F T 46 4 4.0888845650E+05 1.6E+03495693 5.9E-08 F T 47 4 4.0032167464E+05 1.4E+03495693 1.0E+00 2 T T 48 4 3.9256449033E+05 2.1E+03495692 1.0E+00 8 T T 49 4 3.8241056114E+05 1.5E+03495690 1.0E+00 13 T T 50 4 3.6630578855E+05 1.9E+03495687 1.0E+00 19 T T
The column NSB (Number of SuperBasics) is very large: it merges with the previous column.
For these models, it is a good idea to try out an interior point algorithm. When we compare CONOPT with IPOPT/IPOPTH, we see:
---- 222 PARAMETER report timings of different solvers obj iter time CONOPT -7720.083 124.000 618.656 IPOPT -7720.083 15.000 171.602 IPOPTH -7720.083 15.000 25.644
IPOPTH with its better linear algebra routines really does a good job here.
Expensive function and gradient evaluations?
In [2] the following is stated:
Sequential quadratic programming methods are mainly useful for problems with expensive evaluations.
I have not seen this argument before, and I am not sure it fits with my experience. My models in general have very cheap functions and gradients. In any case, we can look at the number of function and gradient evaluations to see if there is a difference between the SQP and IP solver. For our first model, CONOPT (SQP solver) reports:
Timing for Function Evaluations: Statistics for FDEval-Fnc Calls: 460. Time: 0.127 T/C: 2.7609E-04 Statistics for FDEval-Drv Calls: 413. Time: 0.196 T/C: 4.7458E-04 Statistics for 2DDirLag Calls: 21. Time: 0.014 T/C: 6.6667E-04 Statistics for 2DLagr Calls: 216. Time: 0.710 T/C: 3.2870E-03
Roughly say 500 function and gradient evaluations. IPOPT reports:
Number of objective function evaluations = 44 Number of objective gradient evaluations = 44
At least for this model, IPOPT uses very few evaluations. If they are expensive, IPOPT would be hurt less than CONOPT.
Expensive function evaluations are the rule when calling a function that will run some simulation. In this case, it surely is a bad idea to use finite differences. That would create a large number of extra function evaluations. Rather we want to use some DFO (Derivative-Free Optimization) method [3].
Conclusions
Here I tried to illustrate some of the arguments we can use to choose between an active-set algorithm or an interior point solver.
For almost all of my models, expensive function and gradient evaluations are not a major concern.
As the performance can differ quite a lot between these two types of solvers, it is a good idea to do some performance evaluation by testing multiple solvers.
References
- Some Matrix Balancing Experiments, http://yetanothermathprogrammingconsultant.blogspot.com/2022/08/some-matrix-balancing-experiments.html
- Non linear programming, https://or.stackexchange.com/questions/10020/non-linear-programming
- Rios, L.M., Sahinidis, N.V. Derivative-free optimization: a review of algorithms and comparison of software implementations. J Glob Optim 56, 1247–1293 (2013).
You could modify the problem a bit in the second experiment before resolving to make the comparison more relevant. For example, changing a bound of a variable with a non-integral value to simulate what happens in a branch-and-bound algorithm.
ReplyDeleteAnd out of curiosity, could you share the exact model you used for the experiments?
ReplyDelete"Expensive function evaluations are the rule when calling a function that will run some simulation. In this case, it surely is a bad idea to use finite differences. That would create a large number of extra function evaluations. Rather we want to use some DFO (Derivative-Free Optimization) method [3]."
It happens that an expensive black-box function still provides the derivatives without additional cost. In this case, one can expect derivative-based algorithms to work better than DFO algorithms
Black-box optimization is really synonym to DFO.
DeleteConsider for example the Lagrangian relaxation of a combinatorial problem. One optimizes a black-box function which consists in solving an optimisation subproblem. Yet, the sub-gradient is directly available without any additional cost
Delete