Tuesday, March 14, 2023

Choosing between NLP solvers: interior point or active set.

One way to categorize (local) nonlinear programming (NLP) solvers is active set methods and interior point solvers. Some representative large-scale sparse solvers are:

  • Active set: CONOPT, SNOPT. These are using SQP algorithms.
  • Interior point: IPOPT, Knitro. Note: Knitro also contains an active set algorithm.
These are two very different types of algorithms. They work very well on different types of problems. If you do a lot of nonlinear optimization, it actually makes sense to have both types of solvers available. With modeling tools like AMPL and GAMS, it is very easy to switch between solvers. Testing performance between two solvers is then also a snap.

Here I want to discuss some factors that can help in deciding whether to use an active set solver or an interior point solver. 

Clean solutions


Interior point codes will return an interior solution, i.e., a solution with variable levels that are strictly between their bounds. Sometimes this makes solutions a bit "ugly". Here is an example of a solution with IPOPT:


----    128 VARIABLE x.L  shipments along arcs

             dst1        dst2        dst3        dst4        dst5        dst6        dst7        dst8        dst9       dst10

src1  -9.9637E-11 -9.9679E-11     379.812      33.417 -9.9767E-11 -9.9838E-11 -9.9714E-11 -9.0291E-11 -9.9895E-11 7.94230E-11
src2      995.348 -9.9769E-11    1033.580 -9.9854E-11 -9.8755E-11 -9.8043E-11 -9.9853E-11 -9.9796E-11 -9.9871E-11 -9.9235E-11
src3  -9.9768E-11 -9.9454E-11 -9.9647E-11    1324.222 -9.9648E-11 -9.8739E-11 -9.9884E-11 -9.9783E-11 -9.9905E-11 -9.8972E-11
src4  -9.9852E-11 -9.9671E-11 -9.9831E-11 -9.9573E-11 -9.7041E-11     724.548 -9.9883E-11 -9.9775E-11 -9.9685E-11 -9.9829E-11
src5      632.127 -9.9319E-11 -9.9749E-11 -9.8960E-11 -9.6832E-11      70.945 -9.9859E-11 -9.9766E-11 -9.9827E-11 -9.8438E-11
src6  -9.9744E-11 -9.9772E-11 -9.9566E-11 -9.9901E-11 -9.9484E-11 -9.9876E-11 -9.9846E-11 -9.9369E-11 -9.9914E-11     539.079
src7  -8.8389E-11     497.790       0.009 -9.7318E-11 -9.6306E-11     343.906 -9.9603E-11 -9.9322E-11 -9.9813E-11 -9.9854E-11
src8  -9.9849E-11 -9.8903E-11     351.900 -9.9735E-11     232.776 -9.9783E-11     284.116 -9.9657E-11    1191.424 -9.6824E-11
src9      150.265      11.213 -9.5633E-11 -9.9748E-11 -9.9800E-11 -9.8005E-11 -9.9816E-11 -9.9057E-11 -9.9726E-11 -9.9848E-11
src10 -9.9584E-11     521.776 -9.9354E-11 -9.9200E-11 -9.3240E-11 -9.9843E-11 -9.8713E-11     445.417 -9.9378E-11     236.332


Obviously, this is a bit more difficult to read than a solution returned by CONOPT:


----    130 VARIABLE x.L  shipments along arcs

             dst1        dst2        dst3        dst4        dst5        dst6        dst7        dst8        dst9       dst10

src1                              379.812      33.417
src2      995.348                1033.580
src3                                         1324.222
src4                                                                  724.548
src5      632.127                                                      70.945
src6                                                                                                                  539.079
src7                  497.790       0.009                             343.906
src8                              351.900                 232.776                 284.116                1191.424
src9      150.265      11.213
src10                 521.776                                                                 445.417                 236.332

You could try to apply some rounding to get a similar result. In Linear Programming, interior point solvers are often followed by something called a crossover. This will try to move the interior point toward a basic solution. IPOPT does not have such an algorithm (Knitro contains a crossover algorithm). 

Besides this, the duals are also somewhat different

Because of this behavior, I sometimes use CONOPT to "clean up" a solution that was found by IPOPT.

Exploiting a good initial point


One major difference between the behavior of active set and interior point NLP solvers is that active set solvers are often much better at restarting from a good solution. Let's delve in a bit.

In this experiment, I solve a fairly large convex, linearly constrained NLP problem from scratch and then resolve it using the optimal point as starting point. This gives an idea of how much time we can save using an optimal (or near optimal) initial point. Here we try this with different solver combinations:

----    126 PARAMETER report  solve report

INDEX 1 = A

                       obj        time  total time        iter      status

baserun.CONOPT  467317.782      11.781                1750.000    localopt
restart.CONOPT  467317.782       0.031      11.812       4.000    localopt

INDEX 1 = B

                      obj        time  total time        iter      status

baserun.IPOPT  467317.782      12.926                  43.000    localopt
restart.IPOPT  467317.782       5.627      18.553      15.000    localopt

INDEX 1 = C

                       obj        time  total time        iter      status

baserun.IPOPTH  467317.782       1.693                  44.000    localopt
restart.IPOPTH  467317.782       0.684       2.377      14.000    localopt

INDEX 1 = D

                       obj        time  total time        iter      status

baserun.IPOPTH  467317.782       1.725                  44.000    localopt
restart.CONOPT  467317.782       0.047       1.772      11.000    localopt

Let's discuss this a bit:
  • Case A. Here we see that CONOPT is doing a great job in the restart. It basically needs very little time to recover the optimal solution. This is typical behavior for simplex-based LP solvers and active set NLP solvers.
  • Case B. The restart for IPOPT is not as great. We need almost half the solution time when we pass on the optimal solution as starting point. This is normal for interior point algorithms.
  • Case C illustrates that IPOPTH is just a much faster version of IPOPT. IPOPTH enjoys a better linear algebra library (it uses Harwell's MA27) than the standard IPOPT (which uses MUMPS). This solves the base run very fast. The restart is still not great.
  • Case D. With these results, it seems to make sense to use IPOPTH to do the base run and CONOPT for restarting from (near) optimal points. Indeed this is the fastest combination. Note that CONOPT is doing a great job here, as IPOPTH returns an interior point. Effectively in this experiment, CONOPT is performing a crossover step. 
For just one restart, this may not be so exciting. But if you have to solve a larger number of related scenarios, this approach is quite appealing: use an interior point code to solve the base run, and an active set method to solve the restarts.


Large number of superbasics


Interior point algorithms (for LPs and NLPs), are often performing well on very large and sparse problems. Another reason to use an interior point method instead of an active set method is slowdown because we have a large number of superbasic variables. In linear programming, we are familiar with splitting the variables (including slacks) into two sets: basic variables and non-basic variables. Non-basic variables are temporarily fixed at one of their bounds. An active set NLP solver uses a similar idea. But it adds the concept of superbasic variables: variables that are non-basic but between their bounds. Active set solvers usually like it when the number of superbasics is not too large. Interior point algorithms don't really care. You could even say that for an IP algorithm, all variables are superbasic as they live in the strict interior of the bounds.

In [1] a large convex linearly constrained problem is solved using different solvers. These models have a large number of superbasics (both during the optimization and in the final solution). 

Here we see a small piece of the solver log using CONOPT:

   Iter Phase Ninf     Objective     RGmax    NSB   Step InItr MX OK
     41   3        4.1213459188E+05 2.3E+03495694 5.2E-01   15 F  T
     42   3        4.1078726901E+05 1.7E+03495694 4.1E-01   14 F  T
     43   3        4.1000483390E+05 1.5E+03495694 1.5E-07      T  T
     44   3        4.0958894025E+05 1.9E+03495693 6.0E-08      F  T
     45   3        4.0922672671E+05 1.5E+03495693 7.7E-08      F  T
     46   4        4.0888845650E+05 1.6E+03495693 5.9E-08      F  T
     47   4        4.0032167464E+05 1.4E+03495693 1.0E+00    2 T  T
     48   4        3.9256449033E+05 2.1E+03495692 1.0E+00    8 T  T
     49   4        3.8241056114E+05 1.5E+03495690 1.0E+00   13 T  T
     50   4        3.6630578855E+05 1.9E+03495687 1.0E+00   19 T  T

The column NSB (Number of SuperBasics) is very large: it merges with the previous column. 

For these models, it is a good idea to try out an interior point algorithm. When we compare CONOPT with IPOPT/IPOPTH, we see:

----    222 PARAMETER report  timings of different solvers

                obj        iter        time

CONOPT    -7720.083     124.000     618.656
IPOPT     -7720.083      15.000     171.602
IPOPTH    -7720.083      15.000      25.644

IPOPTH with its better linear algebra routines really does a good job here.

Expensive function and gradient evaluations?


In [2] the following is stated:

Sequential quadratic programming methods are mainly useful for problems with expensive evaluations.

I have not seen this argument before, and I am not sure it fits with my experience. My models in general have very cheap functions and gradients. In any case, we can look at the number of function and gradient evaluations to see if there is a difference between the SQP and IP solver. For our first model, CONOPT (SQP solver) reports:

 Timing for Function Evaluations:
 
 Statistics for FDEval-Fnc Calls:       460. Time:      0.127 T/C:    2.7609E-04
 Statistics for FDEval-Drv Calls:       413. Time:      0.196 T/C:    4.7458E-04
 Statistics for 2DDirLag   Calls:        21. Time:      0.014 T/C:    6.6667E-04
 Statistics for 2DLagr     Calls:       216. Time:      0.710 T/C:    3.2870E-03


Roughly say 500 function and gradient evaluations.  IPOPT reports:

Number of objective function evaluations             = 44
Number of objective gradient evaluations             = 44

At least for this model, IPOPT uses very few evaluations. If they are expensive, IPOPT would be hurt less than CONOPT.

Expensive function evaluations are the rule when calling a function that will run some simulation. In this case, it surely is a bad idea to use finite differences. That would create a large number of extra function evaluations. Rather we want to use some DFO (Derivative-Free Optimization) method [3].

Conclusions


Here I tried to illustrate some of the arguments we can use to choose between an active-set algorithm or an interior point solver. 

For almost all of my models, expensive function and gradient evaluations are not a major concern. 

As the performance can differ quite a lot between these two types of solvers, it is a good idea to do some performance evaluation by testing multiple solvers.

References


  1. Some Matrix Balancing Experiments, http://yetanothermathprogrammingconsultant.blogspot.com/2022/08/some-matrix-balancing-experiments.html
  2. Non linear programming, https://or.stackexchange.com/questions/10020/non-linear-programming
  3. Rios, L.M., Sahinidis, N.V. Derivative-free optimization: a review of algorithms and comparison of software implementations. J Glob Optim 56, 1247–1293 (2013).

4 comments:

  1. You could modify the problem a bit in the second experiment before resolving to make the comparison more relevant. For example, changing a bound of a variable with a non-integral value to simulate what happens in a branch-and-bound algorithm.

    ReplyDelete
  2. And out of curiosity, could you share the exact model you used for the experiments?

    "Expensive function evaluations are the rule when calling a function that will run some simulation. In this case, it surely is a bad idea to use finite differences. That would create a large number of extra function evaluations. Rather we want to use some DFO (Derivative-Free Optimization) method [3]."
    It happens that an expensive black-box function still provides the derivatives without additional cost. In this case, one can expect derivative-based algorithms to work better than DFO algorithms

    ReplyDelete
    Replies
    1. Black-box optimization is really synonym to DFO.

      Delete
    2. Consider for example the Lagrangian relaxation of a combinatorial problem. One optimizes a black-box function which consists in solving an optimisation subproblem. Yet, the sub-gradient is directly available without any additional cost

      Delete