It is always difficult to form a model from an informal description. For me, it often helps to develop some example data. So here we go:I have a job scheduling problem with a twist- a minimization constraint. The task is- I have many jobs, each with various dependencies on other jobs, without cycles. These jobs have categories as well, and can be ran together in parallel for free if they belong to the same category. So, I want to order the jobs so that each job comes after its dependencies, but arranged in such a way that they are grouped by category (to run many in parallel) to minimize the number of serial jobs I run. That is, adjacent jobs of the same category count as a single serial job[1].

---- 28 SET j jobs job1 , job2 , job3 , job4 , job5 , job6 , job7 , job8 , job9 , job10, job11, job12 job13, job14, job15, job16, job17, job18, job19, job20, job21, job22, job23, job24 job25, job26, job27, job28, job29, job30, job31, job32, job33, job34, job35, job36 job37, job38, job39, job40, job41, job42, job43, job44, job45, job46, job47, job48 job49, job50 ---- 28 SET c category cat1, cat2, cat3, cat4, cat5 ---- 28 SET jc job-category mapping cat1 cat2 cat3 cat4 cat5 job1 YES job2 YES job3 YES job4 YES job5 YES job6 YES job7 YES job8 YES job9 YES job10 YES job11 YES job12 YES job13 YES job14 YES job15 YES job16 YES job17 YES job18 YES job19 YES job20 YES job21 YES job22 YES job23 YES job24 YES job25 YES job26 YES job27 YES job28 YES job29 YES job30 YES job31 YES job32 YES job33 YES job34 YES job35 YES job36 YES job37 YES job38 YES job39 YES job40 YES job41 YES job42 YES job43 YES job44 YES job45 YES job46 YES job47 YES job48 YES job49 YES job50 YES ---- 28 PARAMETER length job duration job1 11.611, job2 12.558, job3 11.274, job4 7.839, job5 5.864, job6 6.025, job7 11.413 job8 10.453, job9 5.315, job10 12.924, job11 5.728, job12 6.757, job13 10.256, job14 12.502 job15 6.781, job16 5.341, job17 10.851, job18 11.212, job19 8.894, job20 8.587, job21 7.430 job22 7.464, job23 6.305, job24 14.334, job25 8.799, job26 12.834, job27 8.000, job28 6.255 job29 12.489, job30 5.692, job31 7.020, job32 5.051, job33 7.696, job34 9.999, job35 6.513 job36 6.742, job37 8.306, job38 8.169, job39 8.221, job40 14.640, job41 14.936, job42 8.699 job43 8.729, job44 12.720, job45 8.967, job46 14.131, job47 6.196, job48 12.355, job49 5.554 job50 10.763 ---- 28 SET before dependencies job3 job9 job13 job21 job23 job27 job32 job41 job42 job1 YES job3 YES job4 YES job8 YES job9 YES YES job12 YES job14 YES job21 YES job26 YES job31 YES + job43 job46 job48 job10 YES YES job11 YES ---- 28 PARAMETER due some jobs have a due date job16 50.756, job19 57.757, job20 58.797, job25 74.443, job29 65.605, job32 55.928, job50 58.012

So we have 50 jobs and 5 categories. The set \(jc_{j,c}\) indicates if job \(j\) corresponds to category \(c\). Jobs have a duration or processing time, and some jobs have a precedence structure: the set \(\mathit{before}_{i,j}\) indicates that job \(i\) has to executed before job \(j\). Besides, some jobs have a due date.

The way I generated the due dates and the processing times suggests a

**continuous time**model. The main variables are: \[\begin{cases} \mathit{start}_j \ge 0& \text{start of job $j$}\\ \mathit{end}_j \ge 0& \text{end of job $j$}\\ \delta_{i,j} \in \{0,1\} & \text{binary variable used to model no-overlap constraints}\end{cases}\]

As the objective function, I use: minimize the total

**makespan**. The idea is that this will automatically try to use as much as possible parallel jobs of the same category. We have the following constraints:

- A job finishes at start time plus
**job length**. - For some jobs, we have a
**due date**. This becomes an upper bound on the**end**variable in the model. In practical models, you may want to allow a job to finish later than the due but at a cost. This would make it easier to generate meaningful results if not all due dates can be met. - The
**precedence constraints**are simple: job \(j\) can not start before job \(i\) has finished. **No-overlap constraints**require some thought. We use a binary variable to indicate that job \(i\) is executed before or after job \(j\). This has to hold for a subset of \((i,j)\) combinations: (a) only if \(i\lt j\): we don't want to check a pair twice, (b) only if there is no precedence constraint already in effect, and (c) only if jobs are of a different category. We can store the results of these three conditions in a set \(\mathit{NoOverlap}_{i,j}\) which indicates which elements \((i,j)\) need the no-overlap constraints. This set will be used in the model below.

With this in mind, we can formalize this as:

Mixed Integer Programming Model |
---|

\[\begin{align} \min\> & \color{darkred}{\mathit{makespan}}\\ & \color{darkred}{\mathit{makespan}} \ge \color{darkred}{\mathit{end}}_j && \forall j\\ & \color{darkred}{\mathit{end}}_j = \color{darkred}{\mathit{start}}_j + \color{darkblue}{\mathit{length}}_j&& \forall j \\ & \color{darkred}{\mathit{end}}_j \le \color{darkblue}{\mathit{due}}_j && \forall j | \color{darkblue}{\mathit{due}}_j \text{ exists}\\ & \color{darkred}{\mathit{end}}_i \le \color{darkred}{\mathit{start}}_j && \forall i,j|\color{darkblue}{\mathit{Precedence}}_{i,j} \text{ exists}\\ & \color{darkred}{\mathit{end}}_i \le \color{darkred}{\mathit{start}}_j +\color{darkblue} M \color{darkred}\delta_{i,j} && \forall i,j|\color{darkblue}{\mathit{NoOverlap}}_{i,j} \\ &\color{darkred}{\mathit{end}}_j \le \color{darkred}{\mathit{start}}_i +\color{darkblue} M (1-\color{darkred}\delta_{i,j}) && \forall i,j|\color{darkblue}{\mathit{NoOverlap}}_{i,j} \\ & \color{darkred}{\mathit{start}}_j, \color{darkred}{\mathit{end}}_j \ge 0 \\ & \color{darkred}\delta_{i,j} \in \{0,1\}\end{align} \] |

Here \(M\) is a large enough constant, e.g., the planning window. There are many variations on this model (Constraint Programming formulations, alternative modeling to prevent the big-M constructs, such as SOS1 variables or indicator constraints). Still, as the first line of attack, this is not too bad. This formulation is similar to the classic job-shop formulation by Alan Manne [1]. The main issue with this model is that some thought has gone into developing the set \(\mathit{NoOverlap}_{i,j}\), which indicates which combinations of jobs \((i,j)\) we need to protect against being executed at the same time.

We can substitute out \(\mathit{start}_j\) or \(\mathit{end}_j\). I would prefer to keep them both in the model to improve readability.

The model solves quickly (30 seconds on my laptop). It is not very large for modern solvers:

MODEL STATISTICS BLOCKS OF EQUATIONS 5 SINGLE EQUATIONS 2,058 BLOCKS OF VARIABLES 4 SINGLE VARIABLES 1,073 NON ZERO ELEMENTS 6,060 DISCRETE VARIABLES 972

When we solve this, we get the following results:

---- 67 VARIABLE start.L start of job job1 36.099, job2 23.541, job3 61.714, job4 2.924, job7 87.323, job8 25.646, job9 72.989 job10 47.710, job11 23.265, job12 47.710, job13 23.265, job14 10.763, job15 36.099, job16 10.763 job17 36.859, job18 87.323, job19 10.763, job20 47.710, job21 87.323, job22 87.323, job23 78.304 job24 72.989, job25 47.710, job26 23.265, job27 94.753, job28 10.763, job29 10.776, job31 36.099 job32 47.710, job33 36.099, job34 23.265, job37 47.710, job38 10.763, job39 10.763, job40 47.710 job41 47.710, job42 36.099, job43 87.323, job44 72.989, job46 73.192, job47 10.763, job48 60.634 job49 10.763 ---- 67 VARIABLE end.L end of job job1 47.710, job2 36.099, job3 72.989, job4 10.763, job5 5.864, job6 6.025 job7 98.736, job8 36.099, job9 78.304, job10 60.634, job11 28.993, job12 54.467 job13 33.521, job14 23.265, job15 42.880, job16 16.104, job17 47.710, job18 98.535 job19 19.657, job20 56.297, job21 94.753, job22 94.787, job23 84.609, job24 87.323 job25 56.510, job26 36.099, job27 102.754, job28 17.018, job29 23.265, job30 5.692 job31 43.119, job32 52.761, job33 43.795, job34 33.264, job35 6.513, job36 6.742 job37 56.017, job38 18.932, job39 18.984, job40 62.350, job41 62.646, job42 44.798 job43 96.052, job44 85.708, job45 8.967, job46 87.323, job47 16.959, job48 72.989 job49 16.317, job50 10.763

(Some jobs have a start time of zero. They are not printed here.)

For scheduling models, this type of output is difficult to interpret by humans. Better is something like a Gantt chart:

Results for example data set |

Indeed, we see that many jobs of the same category are executed in parallel. We have two category 1 and 2 periods. This is due to due dates and precedence constraints. (If we did not have due dates or precedence constraints, we would see just a single region for each category.) Also, we understand that some jobs have a little bit of freedom to float within a window. For instance, job 4 could have started a bit earlier. We can fix this in post-processing or tell the model to move jobs to the left when possible (this would mean an additional term in the objective).

Note that commercial solvers have no problems with this model and data set. For instance Cplex shows:

Root node processing (before b&c):

Real time = 3.16 sec. (583.75 ticks)

Parallel b&c, 8 threads:

Real time = 33.03 sec. (16153.92 ticks)

Sync time (average) = 4.97 sec.

Wait time (average) = 0.01 sec.

------------

Total (root+branch&cut) = 36.19 sec. (16737.66 ticks)

MIP status(101): integer optimal solution

Cplex Time: 36.20sec (det. 16737.67 ticks)

However, I see more problems with open source and academic code. E.g., SCIP on NEOS shows:

```
SCIP Status : problem is solved [optimal solution found]
Solving Time (sec) : 13010.25
Solving Nodes : 5602808
Primal Bound : +1.02753723219890e+02 (78 solutions)
Dual Bound : +1.02753723219890e+02
Gap : 0.00 %
```

This is quite a dramatic difference.

#### References

- Job scheduling with minimization by parallel grouping, https://stackoverflow.com/questions/59722383/job-scheduling-with-minimization-by-parallel-grouping
- Alan Manne,
*On the Job-Shop Scheduling Problem*, Operations Research, 1960, vol. 8, issue, pp. 219-223.

## No comments:

## Post a Comment