## Friday, February 24, 2023

### Another fast MIP model: covering

In [1], the following problem is stated:

• There is a collection of $$n=1,000$$ test questions.
• Each question covers a number of skills.
• Given is a requirement for a number of questions for each required skill (e.g., 4 questions about skill 1, 3 questions about skill 2, etc.).
• Create a test with the minimum number of questions that fulfills the requirements.
Of course, we see some remarks about NP-Hardness. Complexity theory is about bounds, about worst-case and asymptotic behavior. Even if we can't prove that a particular problem with a particular dataset given to a particular solver must be fast, it does not mean the opposite: that it must be slow. It is not at all unusual that we can solve things very quickly, even though the complexity bounds are terrible. Of course, this depends on the model, on the data set, and on the solver. Here, we see an open-source MIP solver can solve this particular problem in less than 0.01 seconds. Many well-educated people misunderstand complexity theory – to their detriment. They miss out on great ways to solve actual problems! Obviously, this is a problem with how complexity theory is taught.

Ok, here we go (ignoring the NP-hard doomsayers). We don't have data, so the first thing to do is to invent some. I assume a library of $$n=1,000$$ questions and some 25 possible skills. So we have the sets $$q$$ and $$s$$. A 2-dimensional set $$\color{darkblue}qs_{q,s}$$ indicates which skills are covered by which questions. Also, we need to invent some requirements.

A partial view of the randomly-generated sparse question-skill data is:

----     11 SET qs  skills covered by questions

skill_1     skill_2     skill_3     skill_4     skill_5     skill_6     skill_7     skill_8     skill_9

question3                                                                                                            YES
question4                                                                                    YES
question5                        YES
question8                                                                        YES                                 YES
question9                                    YES
question11                                   YES
question14                                                                       YES         YES
question20           YES                     YES
. . .


while the requirements are:

----     11 PARAMETER ar  assessment requirements

skill_2   3.000,    skill_6  10.000,    skill_14  6.000,    skill_17  4.000,    skill_21  7.000,    skill_22  2.000
skill_24  3.000


The mathematical model can look like:

Mathematical Model
\begin{align} \text{Sets} \\ & q \text{: questions} \\ & s \text{: skills} \\ & \color{darkblue}qs_{q,s}\text{: skill is covered by question} \\ & \hline \\ \text{Data} \\ & \color{darkblue}{\mathit{ar}}_s \text{: assessment requirements}\\ & \hline \\ \text{Variables} \\ & \color{darkred}{\mathit numq} \text{: number of questions needed} \\ & \color{darkred}{\mathit question}_{s} = \begin{cases}1&\text{if question s is used}\\ 0&\text{otherwise}\end{cases} \\ & \hline \\ \text{Model} \\ \min\>& \color{darkred}{\mathit numq} = \sum_{q} \color{darkred}{\mathit question}_{q} \\ & \sum_{q|\color{darkblue}qs_{q,s}}\color{darkred}{\mathit question}_{q} \ge \color{darkblue}{\mathit{ar}}_s && \forall s|\color{darkblue}{\mathit{ar}}_s\gt 0 \\ & \color{darkred}{\mathit question}_s \in \{0,1\} \end{align}

This small model solves very quickly. Using the open-source solver CBC, I see:

COIN-OR CBC      42.2.0 ef14ea53 Feb 16, 2023          WEI x86 64bit/MS Window

COIN-OR Branch and Cut (CBC Library 2.10.8)
written by J. Forrest

Calling CBC main solution routine...
Integer solution of 19 found by feasibility pump after 0 iterations and 0 nodes (0.00 seconds)
Search completed - best objective 19, took 0 iterations and 0 nodes (0.00 seconds)
Maximum depth 0, 0 variables fixed on reduced cost

Solved to optimality (within gap tolerances optca and optcr).
MIP solution:    1.900000e+01   (0 nodes, 0.006 seconds)

Best possible:   1.900000e+01
Absolute gap:    0.000000e+00   (absolute tolerance optca: 0)
Relative gap:    0.000000e+00   (relative tolerance optcr: 0.0001)

Resolve with fixed discrete variables.
Optimal - objective value 19


The solution time was 0.006 seconds, and the total turn-around time was less than 0.1 seconds. The solution looks like:

----     27 VARIABLE numq.L                =       19.000  number of questions needed

----     27 VARIABLE question.L  1 if question is included in test

question9   1.000,    question44  1.000,    question76  1.000,    question90  1.000,    question149 1.000,    question239 1.000
question347 1.000,    question416 1.000,    question446 1.000,    question484 1.000,    question506 1.000,    question540 1.000
question579 1.000,    question600 1.000,    question622 1.000,    question633 1.000,    question771 1.000,    question873 1.000
question923 1.000


A more complete solution report can look like:

----     61 PARAMETER solution

skill_2     skill_3     skill_5     skill_6     skill_8     skill_9    skill_11    skill_12    skill_13

question9                     1.000
question44                                            1.000                               1.000
question90                                            1.000
question149                                           1.000
question239       1.000                   1.000
question347                                           1.000
question416                                           1.000                   1.000                               1.000
question446       1.000
question540                                           1.000
question579                                           1.000       1.000                   1.000
question600                                           1.000
question622                                           1.000                                           1.000
question633       1.000                               1.000
question873                                                                               1.000
question923                               1.000
total             3.000       1.000       2.000      10.000       1.000       1.000       3.000       1.000       1.000
required          3.000                              10.000

+    skill_14    skill_15    skill_17    skill_19    skill_20    skill_21    skill_22    skill_24

question9                                 1.000                               1.000
question44                                                                                1.000
question76        1.000                                                       1.000
question90                                                                    1.000
question149                               1.000                               1.000
question239                                                                   1.000
question446       1.000       1.000
question484       1.000                                           1.000                               1.000
question506                                                                   1.000                   1.000
question540       1.000
question622                                           1.000
question771                                                                   1.000                   1.000
question873       1.000                   1.000                                           1.000
question923       1.000                   1.000
total             6.000       1.000       4.000       1.000       1.000       7.000       2.000       3.000
required          6.000                   4.000                               7.000       2.000       3.000


These types of covering models often solve very fast. There is no good reason to not consider a MIP solution for a problem like this.

Note: increasing the size of the problem to several thousand questions, does not make the problem much more difficult to solve. An advantage of using random data is that we can trivially change the size of the problem.

### References

1. Dynamic programming solution for assigning least number of questions that satisfy skills, https://stackoverflow.com/questions/75488358/dynamic-programming-solution-for-assigning-least-number-of-questions-that-satisf

### Appendix: GAMS model

 $ontext Design a test with minimum number of questions such that skill coverage is obeyed. Reference: https://stackoverflow.com/questions/75488358/dynamic-programming-solution-for-assigning-least-number-of-questions-that-satisf This is an easy set covering type of problem. They often solve very fast.$offtext  *------------------------------------------------------------------* random data set*------------------------------------------------------------------ set   q 'questions' /question1*question1000/   s 'skills'    /skill_1*skill_25/   qs(q,s) 'skills covered by questions';qs(q,s) = uniform(0,1) < 0.04; Parameter ar(s) 'assessment requirements';ar(s)$(uniform(0,1)<0.2) = uniformInt(1,10);display q,s,qs,ar; *------------------------------------------------------------------* model*------------------------------------------------------------------ binary variable question(q) '1 if question is included in test';variable numq 'number of questions needed'; equations cover(s) 'required number of questions with skill s' count 'number of questions in test'; * skip if ar(s)=0cover(s)$ar(s).. sum(qs(q,s), question(q)) =g= ar(s);count.. numq =e= sum(q,question(q)); model m /all/;option mip=cbc;solve m minimizing numq using mip; display numq.l, question.l;  *------------------------------------------------------------------* solution report*------------------------------------------------------------------ parameter solution(*,*);solution(qs(q,s)) = question.l(q)>0.5;solution('total',s) = sum(qs(q,s)\$(question.l(q)>0.5),1);solution('required',s) = ar(s);display solution;