Friday, August 5, 2016

Cplex bug or not?

I was present at a GAMS demonstration yesterday, and I saw a strange thing. The model is a very small transportation model (model 1 in the GAMS model library):

\[\boxed{\begin{align}\min\>&\sum_{i,j} c_{i,j} x_{i,j}& \\ &\sum_j x_{i,j} \le a_i & \forall i \\ &\sum_i x_{i,j} \ge b_j & \forall j\\ &x_{i,j}\ge 0 \end{align}}\]

The twist is to make \(x_{i,j}\) semi-continuous: \(x_{i,j}=0\) or between \([100,1000]\). This can be modeled as:

\[\boxed{\begin{align} & 100 y_{i,j}\le x_{i,j} \le 1000 y_{i,j} \\ & y_{i,j} \in \{0,1\} \end{align}}\]

When looking at the log we saw:

Solution satisfies tolerances.

MIP Solution:          159.525000    (3 iterations, 0 nodes)
Final Solve:           156.375000    (0 iterations)

Best possible:         153.675000
Absolute gap:            5.850000
Relative gap:            0.036671

I expect the highlighted numbers to be the same

Some background : by default GAMS solves a MIP with relative gap tolerance of 10%, so the MIP solution of  159.525 is not globally optimal but rather within 10% of the best possible solution. As GAMS wants to report marginals (duals and reduced costs) for a MIP, all integer variables are fixed and then the problem is resolved as an LP. This is the “Final Solve”. One would expect the same objective as the MIP solution, but here this is not the case. What is happening? Is this a bug?

Let’s compare the solutions.

MIP Solution (MIP solution found within gap tolerance)

----     78 VARIABLE y.L 

             new-york     chicago      topeka

seattle         1.000
san-diego                   1.000       1.000


----     78 VARIABLE x.L  shipment quantities in cases

             new-york     chicago      topeka

seattle       325.000
san-diego                 300.000     300.000


----     78 VARIABLE z.L                   =      159.525  total transportation costs in thousands of dollars

Final Solve (LP solution after fixing integer variables)

----     78 VARIABLE y.L 

             new-york     chicago      topeka

seattle         1.000
san-diego                   1.000       1.000


----     78 VARIABLE x.L  shipment quantities in cases

             new-york     chicago      topeka

seattle       325.000
san-diego                 300.000     275.000


----     78 VARIABLE z.L                   =      156.375  total transportation costs in thousands of dollars

We see the discrete variables (\(y\)) are the same but one of the continuous variables (\(x\)) has changed.

Hypothesis: my guess is that the solution with \(z=159.525\) was found using a heuristic instead of by solving this node using the standard dual simplex method. As a result this node may not be LP optimal. The log gives some hints:

Iteration      Dual Objective            In Variable           Out Variable
     1  I            0.153000    x(seattle.new-york) demand(new-york) artif
     2  I            0.000000     x(seattle.chicago)  demand(chicago) artif
     3             153.675000  x(san-diego.new-york)  supply(seattle) slack
Initializing dual steep norms . . .

Iteration      Dual Objective            In Variable           Out Variable
     1             153.675000e1(san-diego.new- slack  y(san-diego.new-york)
Root relaxation solution time = 0.00 sec. (0.02 ticks)

        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

*     0+    0                          159.5250       12.6000            92.10%
Found incumbent of value 159.525000 after 0.11 sec. (0.11 ticks)

        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

      0     0      153.6750     1      159.5250      153.6750        3    3.67%

Indeed it seems it took zero LP iterations to find the solution of 159.525 possibly by applying some heuristics on the root node solution. Somehow Cplex never solved that node to optimality after applying the heuristic, so we don’t see the 156.375 solution.

Now is this a bug? At first sight it is not. However an integer solution that is not LP optimal is actually often nonsensical. In the above example, the MIP solutions just ships too much to Topeka:

----     86 PARAMETER sol 

             new-york     chicago      topeka    capacity     shipped

seattle       325.000                             350.000     325.000
san-diego                 300.000     300.000     600.000     600.000
demand        325.000     300.000     275.000
received      325.000     300.000     300.000

Obviously this solution does not pass the laugh test. I can see presenting such a solution to your boss may get you into trouble. As a result, I think reported MIP solutions should be LP optimal to prevent such meaningless solutions.

So this GAMS strategy of doing this final solve actually has a good side effect: cleaning up integer solutions. If you don’t use GAMS and use a gap tolerance: recheck your model solutions, because there may be issues with the continuous variables being non-optimal.

Note: this experiment was done with GAMS 24.7.3 and Cplex 12.6.3.0.

Conclusion: when using a gap tolerance Cplex can return solutions that are non-optimal with respect to the continuous variables. This is probably a bug. Although these solutions are (integer) feasible they can have undesirable characteristics. The way to repair these solutions is to fix the integer variables and resolve the model as an LP.