Yet Another Math Programming Consultant: Scheduling: easy MIP

Many scheduling problems cause headaches for MIP solvers. Here is one that is easy. The problem is from [1]:

We need to schedule tasks. Each task must be assigned to a single time slot. Not every time slot is allowed: each task has a set of time slots where it can be executed. An example small data set is:

task_availability_map = {
    "T1" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T2" : [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T3" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T4" : [0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T5" : [0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T6" : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
    "T7" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0],
    "T8" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
    "T9" : [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
    "T10": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
}

In addition we have a maximum of tasks that can be assigned to a single time slot (i,e, a capacity). The goal is to minimize the number of slots we need, such that all tasks are scheduled. If we can handle 3 tasks per time slot, an optimal schedule can look like:

                t6          t8          t9         t12

task1            1
task2            1
task3                        1
task4                        1
task5                        1
task6                                    1
task7                                                1
task8                                                1
task9                                    1
task10                                   1

This solution is by no means unique.

MIP Model

To model this as a MIP model we can use two binary variables:\[ x_{i,t} = \begin{cases} 1 & \text{ if job $i$ is assigned to time slot $t$}\\ 0 & \text{ otherwise}\end{cases} \] and \[y_t = \begin{cases} 1 & \text{if time slot $t$ has at least one job assigned to it}\\ 0 & \text{otherwise}\end{cases}\]

A MIP model can look like: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min& \sum_t y_t\\ & \sum_t \mathit{ok}_{i,t} x_{i,t} = 1 && \forall i \\ & \sum_i \mathit{ok}_{i,t} x_{i,t} \le N && \forall t \\ & y_t \ge x_{i,t} &&\forall i,t|\mathit{ok}_{i,t}=1\\ & x_{i,t}, y_t \in \{0,1\}\end{align}}\]

We can re-interpret $\mathit{ok}_{i,t}$ as $\mathit{ok}_{i,t}=\mathit{true}$ if and only if $(i,t)$ is an allowed assignment. Then the model can be: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min& \sum_t y_t\\ & \sum_{t| \mathit{ok}_{i,t}} x_{i,t} = 1 && \forall i \\ & \sum_{i| \mathit{ok}_{i,t}} x_{i,t} \le N && \forall t \\ & y_t \ge x_{i,t} &&\forall i,t|\mathit{ok}_{i,t}\\ & x_{i,t}, y_t \in \{0,1\}\end{align}}\]

There are a number of interesting issues. The first one is that we have many $(i,t)$ combinations that we know are not allowed. Instead of fixing those $x_{i,t}=0$ we can try to not even generate those variables. For this example we would reduce the number of variables from $10 \times 17 = 170$ to $50$ (the number of ones in the data table).

A GAMS model can look like:

sets

i 'task' /task1*task10/
t 'time slot' /t1*t17/
;

*
* data
*

table allowed(i,t)
            t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17
    task1    0   0   0   1   1   1   1   1   0   0   0   0   0   0   0   0   0
    task2    0   0   0   0   0   1   1   0   0   0   0   0   0   0   0   0   0
    task3    0   0   0   1   1   1   1   1   0   0   0   0   0   0   0   0   0
    task4    0   0   0   0   0   1   1   1   0   0   0   0   0   0   0   0   0
    task5    0   0   0   0   1   1   1   1   0   0   0   0   0   0   0   0   0
    task6    1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0
    task7    0   0   0   0   0   0   0   0   0   0   1   1   1   1   1   0   0
    task8    0   0   0   0   0   0   0   0   0   0   1   1   1   1   0   0   0
    task9    0   0   0   0   0   0   1   1   1   1   0   0   0   0   0   0   0
    task10   0   0   0   0   0   0   1   1   1   1   0   0   0   0   0   0   0
;

scalar N 'capacity of a slot' /3/;

*
* derived data
*
set ok(i,t) 'set version of allowed parameter';
ok(i,t) = allowed(i,t);

scalar nok 'number of allowed assignments';
nok = sum(ok(i,t),1);
display nok;

*
* model
*
binary variables
x(i,t) 'assignment'
y(t) 'slot is used'
;
variable z 'objective variable';

equations
   assign1(i) 'assignment constraint'
   assign2(t) 'assignment constraint'
   ybound(i,t) 'x(i,t)=1 => y(t)=1'
   obj         'objective'
;

assign1(i).. sum(ok(i,t), x(i,t)) =e= 1;
assign2(t).. sum(ok(i,t), x(i,t)) =l= N;
ybound(ok(i,t)).. y(t) =g= x(i,t);
obj.. z =e= sum(t,y(t));

model m /all/;

*
* solve model to optimality
*
option optcr=0;
solve m minimizing z using mip;

*
* display solution
*
option x:0,y:0,z:0;
display x.l, y.l, z.l;

I used the set ok to indicate if a combination $(i,t)$ is allowed. This makes the equations a bit simpler. Equations are the most difficult part in a model to debug, so we should go the extra mile to simplify them.

Using a time slot capacity of 3, the results can look like:

----     64 VARIABLE x.L  assignment

                t6          t8          t9         t12

task1            1
task2            1
task3                        1
task4                        1
task5                        1
task6                                    1
task7                                                1
task8                                                1
task9                                    1
task10                                   1


----     64 VARIABLE y.L  slot is used

t6  1,    t8  1,    t9  1,    t12 1


----     64 VARIABLE z.L                   =            4  objective variable

We can see we really only have 50 $x$ variables by looking at the GAMS column listing:

---- x  assignment

x(task1,t4)
                (.LO, .L, .UP, .M = 0, 0, 1, 0)
        1       assign1(task1)
        1       assign2(t4)
       -1       ybound(task1,t4)

x(task1,t5)
                (.LO, .L, .UP, .M = 0, 0, 1, 0)
        1       assign1(task1)
        1       assign2(t5)
       -1       ybound(task1,t5)

x(task1,t6)
                (.LO, .L, .UP, .M = 0, 0, 1, 0)
        1       assign1(task1)
        1       assign2(t6)
       -1       ybound(task1,t6)

REMAINING 47 ENTRIES SKIPPED

There is one more issue. Some of the columns in the data matrix have only zero entries. This means some of the assign2 constraints are empty. To be more precise they are of the form $0 \le 3$. These are dropped by GAMS (unless they are infeasible, e.g. $0\le -1$, in which case GAMS would complain). You can see this in the equation listing (after increasing the limrow option):

---- assign2  =L=  assignment constraint

assign2(t1)..  x(task6,t1) =L= 3 ; (LHS = 0)
     
assign2(t2)..  x(task6,t2) =L= 3 ; (LHS = 0)
     
assign2(t3)..  x(task6,t3) =L= 3 ; (LHS = 0)
     
assign2(t4)..  x(task1,t4) + x(task3,t4) + x(task6,t4) =L= 3 ; (LHS = 0)
     
assign2(t5)..  x(task1,t5) + x(task3,t5) + x(task5,t5) + x(task6,t5) =L= 3 ; (LHS = 0)
     
assign2(t6)..  x(task1,t6) + x(task2,t6) + x(task3,t6) + x(task4,t6) + x(task5,t6) + x(task6,t6) =L= 3 ; (LHS = 0)
     
assign2(t7)..  x(task1,t7) + x(task2,t7) + x(task3,t7) + x(task4,t7) + x(task5,t7) + x(task6,t7) + x(task9,t7)
     
      + x(task10,t7) =L= 3 ; (LHS = 0)
     
assign2(t8)..  x(task1,t8) + x(task3,t8) + x(task4,t8) + x(task5,t8) + x(task6,t8) + x(task9,t8) + x(task10,t8) =L= 3 ;
     
      (LHS = 0)
     
assign2(t9)..  x(task6,t9) + x(task9,t9) + x(task10,t9) =L= 3 ; (LHS = 0)
     
assign2(t10)..  x(task6,t10) + x(task9,t10) + x(task10,t10) =L= 3 ; (LHS = 0)
     
assign2(t11)..  x(task6,t11) + x(task7,t11) + x(task8,t11) =L= 3 ; (LHS = 0)
     
assign2(t12)..  x(task6,t12) + x(task7,t12) + x(task8,t12) =L= 3 ; (LHS = 0)
     
assign2(t13)..  x(task6,t13) + x(task7,t13) + x(task8,t13) =L= 3 ; (LHS = 0)
     
assign2(t14)..  x(task6,t14) + x(task7,t14) + x(task8,t14) =L= 3 ; (LHS = 0)
     
assign2(t15)..  x(task7,t15) =L= 3 ; (LHS = 0)

The equations assign2(t16) and assign(t17) are dropped.

Of course, if we want, we can make this more explicit by saying something like:

assign2(t)$sum(ok(i,t),1).. sum(ok(i,t), x(i,t)) =l= N;

Here the equation is not generated if the column sum is zero. As this does not add much value, it may be better to just let GAMS take care of this.

We can generalize this a little bit. The constraint assign2 is not needed if the number of entries in column $t$ of the parameter $\mathit{allowed}_{i,t}$ is less or equal to $N$. We can do something like:

parameter cnt(t) 'count nonzero entries in column t';
cnt(t) = sum(i, allowed(i,t));
assign2(t)$(cnt(t)>N).. sum(ok(i,t), x(i,t)) =l= N;

This saves us relatively few constraints. It is much better to focus on $x$ as that is 2-dimensional so we have more bang for the buck. I'll leave it to the solver's presolver to remove the constraints.

Binary or continuous?

If we want we can relax the binary variable $y_t$ to a continuous variable between 0 and 1. Intuitively that is a good idea (fewer integer variables). However note that some solvers prefer them to be binary, and they change them back:

Tried aggregator 2 times.
MIP Presolve eliminated 16 rows and 9 columns.
Aggregator did 1 substitutions.
Reduced MIP has 59 rows, 58 columns, and 163 nonzeros.
Reduced MIP has 47 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.06 sec. (0.16 ticks)
Probing time = 0.00 sec. (0.05 ticks)
Tried aggregator 1 time.
Reduced MIP has 59 rows, 58 columns, and 163 nonzeros.
Reduced MIP has 58 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.00 sec. (0.15 ticks)

MIP or CP (Mixed Integer Programming vs Constraint Programming)?

In the comments it was argued that a constraint programming approach is more suited for this model. There are 2 dimensions to this argument: (1) is the problem easier to formulate as a CP problem or (2) is the problem easier to solve.

I think these two are linked. My rule of thumb is: if the problem is easy to formulate as a MIP keep it a MIP. If we need to use a lot of tricks and extra variables, try a CP formulation. An example where CP is easier is [2]. If the problem is easier to model, there is a good chance that the model also solves faster.

The poster mentioned the real problem has a few hundred tasks and time slots. Let's create some random data for this: 200 tasks, 200 time slots, time slot capacity = 5. Each task has two random values: a start value $\sim U(1,200)$ and a length $\sim U(3,10)$. Here $U$ indicates we draw from the uniform distribution. This means: we can assign a task to time slots $\mathit{start},\dots,\mathit{start}+\mathit{length}-1$. This gives a problem with 1,071 $x$ variables. It solves very quickly: less than a second using 1,022 simplex iterations and all work done in node 0. The GAMS model execution (and generation) time was about 0.1 seconds.

I don't see a good reason to consider a CP approach: the MIP model is simple and the model solves quickly.

Pulp?

In the post it was reported that the implementation in Pulp took a long time. I believe that when we have a properly formulated mathematical model, it should not take too much time to develop Python/Pulp code to implement this model. It can look like:

from pulp import *

#
# data
#
task_availability_map = {
    "T1" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T2" : [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T3" : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T4" : [0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T5" : [0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "T6" : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
    "T7" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0],
    "T8" : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
    "T9" : [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
    "T10": [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
}

tasks = ["T%i" % i for i in range(1,11)]
slots = range(17)

N = 3 # max number of tasks in a slot

#
# model
#

x = LpVariable.dicts(
    "assign",
    [(i, t) for i in tasks for t in slots if task_availability_map[i][t]==1],
    lowBound=0,upBound=1,cat="Integer")

y = LpVariable.dicts(
    "use",slots,lowBound=0,upBound=1,cat="Integer")

prob = LpProblem("Easy MIP",LpMinimize)

prob += lpSum([y[t] for t in slots]), "Number of slots in use"

for i in tasks:
    prob += lpSum([x[(i,t)] for t in slots if task_availability_map[i][t]==1]) == 1, "Assign1(%s)" % i

for t in slots:
    prob += lpSum([x[(i,t)] for i in tasks if task_availability_map[i][t]==1]) <= N, "Assign2(%i)" % t

for i in tasks:
    for t in slots:
        if task_availability_map[i][t]==1:
            prob += y[t] >= x[(i,t)], "Bound(%s,%i)" % (i,t)

#
# for debugging
#
prob.writeLP("d:\\tmp\\easy.lp")

#prob.solve()
prob.solve(PULP_CBC_CMD(msg=1))
# no solve log when running under Jupyter

print "Status: %s" % LpStatus[prob.status]
print "Number of slots needed: %.0f" % value(prob.objective)
print "Assignment:"
for i in tasks:
    for t in slots:
        if task_availability_map[i][t]==1:
              if x[(i,t)].value() > 0.5:
                    print "Task %s -> slot %i" % (i,t)

The print out looks like:

Status: Optimal
Number of slots needed: 4
Assignment:
Task T1 -> slot 4
Task T2 -> slot 5
Task T3 -> slot 4
Task T4 -> slot 6
Task T5 -> slot 4
Task T6 -> slot 13
Task T7 -> slot 13
Task T8 -> slot 13
Task T9 -> slot 6
Task T10 -> slot 6

Some notes.

The Pulp model is more unwieldy than the GAMS model due to all the explicit looping.
To debug a Pulp model it is a good idea to print the LP file. This essentially replaces the GAMS equation and column listing we have shown earlier. We will look at this below.
We make multiple passes over the data. If this is a real large data set we need to develop additional data structures to quickly run through the task_availability_map both row-wise and column-wise, while skipping the zero values.
Pulp does not show the solver log, especially when running under Jupyter. This makes it important to always print the status. It would be nice if the solver log is captured and echoed to the Jupyter output cell.
There are no built-in tools to print solution tables in an easy to read format. You need to develop your own printing solutions.
When printing the solution we should not compare values of binary variables to exactly 1.0. For instance in [2] we see:
```
print "The choosen tables are out of a total of %s:"%len(possible_tables)
for table in possible_tables:
    if x[table].value() == 1.0:
        print table
```
This is not safe. A binary variable can have values like 0.999999 or 1.000001 due to tolerances. My code is a bit wild:
```
if x[(i,t)].value() > 0.5:
```
but it is as safe as can be.
When we plug in the data set with 200 tasks by 200 slots we generated with GAMS earlier, and use a capacity of 5 tasks per slot, we can generate and solve the problem in 4 seconds.
The performance of Pulp/CBC is somewhat erratic. I'll report on this below.
In GAMS a variable declaration does not allocate memory for variables. Hence we can just do binary variable x(i,t). In Pulp we want to allocate only variables that are used, so the LpVariable.dicts statement needs to know which variables are actually being used.

It is interesting to see how Pulp deals with the empty columns in the data. They lead to constraints like $0 \le 3$. When looking at the LP file, we see:

Assign2(0): assign_('T6',_0) <= 3
Assign2(1): assign_('T6',_1) <= 3
Assign2(10): assign_('T6',_10) + assign_('T7',_10) + assign_('T8',_10) <= 3
Assign2(11): assign_('T6',_11) + assign_('T7',_11) + assign_('T8',_11) <= 3
Assign2(12): assign_('T6',_12) + assign_('T7',_12) + assign_('T8',_12) <= 3
Assign2(13): assign_('T6',_13) + assign_('T7',_13) + assign_('T8',_13) <= 3
Assign2(14): assign_('T7',_14) <= 3
_dummy: __dummy = 0
Assign2(15): __dummy <= 3
Assign2(16): __dummy <= 3
Assign2(2): assign_('T6',_2) <= 3
Assign2(3): assign_('T1',_3) + assign_('T3',_3) + assign_('T6',_3) <= 3
Assign2(4): assign_('T1',_4) + assign_('T3',_4) + assign_('T5',_4)
 + assign_('T6',_4) <= 3
Assign2(5): assign_('T1',_5) + assign_('T2',_5) + assign_('T3',_5)
 + assign_('T4',_5) + assign_('T5',_5) + assign_('T6',_5) <= 3
Assign2(6): assign_('T1',_6) + assign_('T10',_6) + assign_('T2',_6)
 + assign_('T3',_6) + assign_('T4',_6) + assign_('T5',_6) + assign_('T6',_6)
 + assign_('T9',_6) <= 3
Assign2(7): assign_('T1',_7) + assign_('T10',_7) + assign_('T3',_7)
 + assign_('T4',_7) + assign_('T5',_7) + assign_('T6',_7) + assign_('T9',_7)
 <= 3
Assign2(8): assign_('T10',_8) + assign_('T6',_8) + assign_('T9',_8) <= 3
Assign2(9): assign_('T10',_9) + assign_('T6',_9) + assign_('T9',_9) <= 3


Bounds
__dummy = 0

We see these two $0 \le 3$ constraints are not dropped, but rather included with a dummy variable __dummy. I am not sure why an empty constraint like $0 \le 3$ is added to the model and passed on to the solver. I am also surprised about __dummy being to zero in two different ways: both bounds are used to fix __dummy to zero and additionally a constraint is included to achieve the same thing. For most solvers one of these measures should be enough. In practice, of course, this is not really a big problem: the presolver will take care of this, and will get rid of the equation _dummy: __dummy = 0.

We can prevent this business with the dummy variable and equation by making sure Pulp does not generate an equation when the left-hand side is empty:

tused = [False for t in slots]  
for i in tasks:
    for t in slots:
        if task_availability_map[i][t]==1:
            tused[t] = True


for t in slots:
    if tused[t]:
        prob += lpSum([x[(i,t)] for i in tasks if task_availability_map[i][t]==1]) <= N, "Assign2(%i)" % t

Printing

One thing I miss when doing modeling in programming languages like Python is an easy way to print results in a nice readable format.

GAMS can print any multi-dimensional parameter or variable using the display statement. Usually the results looks quite reasonable. Here is the output of the statement display allowed, x.l;

----     70 PARAMETER allowed  

                t1          t2          t3          t4          t5          t6          t7          t8          t9

task1                                            1.000       1.000       1.000       1.000       1.000
task2                                                                    1.000       1.000
task3                                            1.000       1.000       1.000       1.000       1.000
task4                                                                    1.000       1.000       1.000
task5                                                        1.000       1.000       1.000       1.000
task6        1.000       1.000       1.000       1.000       1.000       1.000       1.000       1.000       1.000
task9                                                                                1.000       1.000       1.000
task10                                                                               1.000       1.000       1.000

     +         t10         t11         t12         t13         t14         t15

task6        1.000       1.000       1.000       1.000       1.000
task7                    1.000       1.000       1.000       1.000       1.000
task8                    1.000       1.000       1.000       1.000
task9        1.000
task10       1.000


----     70 VARIABLE x.L  assignment

                t5          t6          t7         t13

task1                    1.000
task2                                1.000
task3                    1.000
task4                    1.000
task5        1.000
task6                                            1.000
task7                                            1.000
task8                                            1.000
task9                                1.000
task10                               1.000

I have heard someone saying that the display statement is the most important statement in the GAMS language. Especially during model development when I use small data sets (my favorite), I use the display statement a lot. When the data sets are large I usually use the pivot table facility in the GDX viewer, When the model is beyond its early development stage, I actually pay quite some time and effort to produce meaningful reports, often in Excel. But even when debugging problems in later stages, the display statement remains a very important tool.

Python/Pulp has no good standard printing tool for multidimensional data. We can do something simple, but the output is less clear:

View results in Jupyter

A slightly easier way is to use: [(i,t) for (i,t) in x.keys() if x[(i,t)].value()>0.5].

Things like Panda data frames look quite nice in Jupyter, but typically we need to populate them first.

When developing and debugging models, often under pressure, life without easy data viewing is just more complicated than it should be.

Solver Performance

The performance of Pulp/CBC was not very predictable. Here are some results using the 200 tasks, 200 time slots random data set.

Capacity Time Slot	Modeling Tool	Solver	Threads	Objective	Iterations	Nodes	Seconds
N=3	GAMS	Cplex	1	68	2788	30	1
	GAMS	Cplex	4	68	2937	42	1.9
	GAMS	CBC	1	68	604336	18911	146
	GAMS	CBC	4	68	230016	13323	72
	Pulp	CBC					> 2 hours
N=4	GAMS	Cplex	1	54	2148	0	1
	GAMS	Cplex	4	54	2139	0	2.3
	GAMS	CBC	1	54	273966	11500	97
	GAMS	CBC	4	54	100467	2745	27
	Pulp	CBC		54	374219	13002	192
N=5	GAMS	Cplex	1	49	1022	0	0.4
	GAMS	Cplex	4	49	1022	0	1
	GAMS	CBC	1	49	543	1	2.1
	GAMS	CBC	4	49	543	1	2.5
	Pulp	CBC		49	?	0	4.5

The $N=3$ problem with Pulp went nowhere. The last few lines of the CBC log look like:

Cbc0010I After 690000 nodes, 17543 on tree, 69 best solution, best possible 68 (7140.39 seconds)
Cbc0010I After 691000 nodes, 17542 on tree, 69 best solution, best possible 68 (7149.65 seconds)
Cbc0010I After 692000 nodes, 17543 on tree, 69 best solution, best possible 68 (7159.53 seconds)
Cbc0010I After 693000 nodes, 17535 on tree, 69 best solution, best possible 68 (7167.80 seconds)
Cbc0010I After 694000 nodes, 17510 on tree, 69 best solution, best possible 68 (7176.48 seconds)
Cbc0010I After 695000 nodes, 17494 on tree, 69 best solution, best possible 68 (7183.40 seconds)
Cbc0010I After 696000 nodes, 17592 on tree, 69 best solution, best possible 68 (7194.70 seconds)
Cbc0010I After 697000 nodes, 17579 on tree, 69 best solution, best possible 68 (7203.96 seconds)
Cbc0010I After 698000 nodes, 17571 on tree, 69 best solution, best possible 68 (7209.88 seconds)
Cbc0010I After 699000 nodes, 17566 on tree, 69 best solution, best possible 68 (7215.16 seconds)
Cbc0027I Exiting on user event
Cbc0005I Partial search - best objective 69 (best possible 68), took 9050076 iterations and 699207 nodes (7216.40 seconds)
Cbc0032I Strong branching done 993710 times (8577199 iterations), fathomed 197962 nodes and fixed 223209 variables
Cbc0035I Maximum depth 101, 124125 variables fixed on reduced cost
Cuts at root node changed objective from 67 to 68
Probing was tried 10 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.103 seconds)
Gomory was tried 117378 times and created 50461 cuts of which 95 were active after adding rounds of cuts (286.044 seconds)
Knapsack was tried 10 times and created 2 cuts of which 0 were active after adding rounds of cuts (0.122 seconds)
Clique was tried 10 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.013 seconds)
MixedIntegerRounding2 was tried 10 times and created 3 cuts of which 0 were active after adding rounds of cuts (0.052 seconds)
FlowCover was tried 10 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.006 seconds)
TwoMirCuts was tried 117378 times and created 612 cuts of which 0 were active after adding rounds of cuts (234.192 seconds)

Result - User ctrl-cuser ctrl-c

Objective value:                69.00000000
Lower bound:                    68.000
Gap:                            0.01
Enumerated nodes:               699207
Total iterations:               9050076
Time (CPU seconds):             7216.48
Time (Wallclock seconds):       7216.48

There remain some questions about what Pulp/CBC is doing on some versions of this model.
Notes:

Both the GAMS version and the Pulp version of CBC were based on version 2.9 of the CBC library. The Pulp version says: Build Date: Feb 12 2015. Looks like we deal with a somewhat older executable.
Pulp uses ascii LP files to communicate the model with the solver. GAMS uses binary files. This will cause the problem to be slightly different.
CBC needs to be properly compiled to support multiple threads. Not all CBC versions on Windows have been built with multi-thread support.
CBC failing on this model is somewhat scary.
With slightly different options: prob.solve(PULP_CBC_CMD(threads=1, mip=1, options=['sec','3600'],msg=1)), I saw CBC actually just hang in the middle of the run.

Conclusions

Even small, simple and easy models have interesting angles. In this case we have a decision variable $x_{i,t}$ of which we only want to use a subset, and we may encounter some empty constraints.
Pulp/CBC has some issues on some data sets.
Under the hood Pulp and GAMS are different in how empty constraints are handled.

This model was more interesting than I initially thought.

References

Scheduling optimization to minimize the number of timeslots (with constraints), https://stackoverflow.com/questions/52022575/scheduling-optimization-to-minimize-the-number-of-timeslots-with-constraints
https://www.coin-or.org/PuLP/CaseStudies/a_set_partitioning_problem.html

Yet Another Math Programming Consultant

Wednesday, August 29, 2018

Scheduling: easy MIP