Wednesday, January 6, 2021

Eight soldiers lining up: a very large permutation problem

In [1] an intriguing problem is posted:


There are 8 soldiers, gathering and lining up every morning for their military service. The commander at the head of these soldiers demands that the morning lineup of these soldiers be arranged differently for every next day according to the following rule:

        Any three soldiers cannot be lined up next to each other in the same order for others days.

For example; If ABCDEFGH is the first arrangement for day 1, on the other days, ABC,BCD, CDE, DEF, EFG and FGH cannot be lined up next to each other in the same order any more, but ACB arrangement is okay for other days until used once since they are not in the same order.


Let's first make the problem a bit smaller. Say we have just 4 soldiers. This gives us  \(4!=24\) permutations or line-ups. Let's have a look at the following sets: 

  • \(p\) is the set of permutations
  • \(t\) is the set of substrings of length 3 that we can form from \(p\)
  • \(\color{darkblue}pt(p,t)\) is the mapping between \(p\) and \(t\)

----     53 SET p  permutations or line ups

ABCD,    ABDC,    ACBD,    ACDB,    ADBC,    ADCB,    BACD,    BADC,    BCAD,    BCDA,    BDAC,    BDCA,    CABD
CADB,    CBAD,    CBDA,    CDAB,    CDBA,    DABC,    DACB,    DBAC,    DBCA,    DCAB,    DCBA


----     53 SET t  substrings of length 3

ABC,    BCD,    ABD,    BDC,    ACB,    CBD,    ACD,    CDB,    ADB,    DBC,    ADC,    DCB,    BAC,    BAD,    BCA
CAD,    CDA,    BDA,    DAC,    DCA,    CAB,    CBA,    DAB,    DBA


----     55 SET pt  mapping (computed in Python)

ABCD.ABC,    ABCD.BCD,    ABDC.ABD,    ABDC.BDC,    ACBD.ACB,    ACBD.CBD,    ACDB.ACD
ACDB.CDB,    ADBC.ADB,    ADBC.DBC,    ADCB.ADC,    ADCB.DCB,    BACD.ACD,    BACD.BAC
BADC.ADC,    BADC.BAD,    BCAD.BCA,    BCAD.CAD,    BCDA.BCD,    BCDA.CDA,    BDAC.BDA
BDAC.DAC,    BDCA.BDC,    BDCA.DCA,    CABD.ABD,    CABD.CAB,    CADB.ADB,    CADB.CAD
CBAD.BAD,    CBAD.CBA,    CBDA.CBD,    CBDA.BDA,    CDAB.CDA,    CDAB.DAB,    CDBA.CDB
CDBA.DBA,    DABC.ABC,    DABC.DAB,    DACB.ACB,    DACB.DAC,    DBAC.BAC,    DBAC.DBA
DBCA.DBC,    DBCA.BCA,    DCAB.DCA,    DCAB.CAB,    DCBA.DCB,    DCBA.CBA


----     66 PARAMETER np                   =           24  card(p)
            PARAMETER nt                   =           24  card(t)
            PARAMETER npt                  =           48  card(pt)


In [1], Rob Pratt proposes a MIP model for this:

Model A
\[\begin{align}\max\> & \color{darkred}z = \sum_p \color{darkred}x_p \\ &\sum_{p|\color{darkblue}{pt}(p,t)} \color{darkred}x_p \le 1&& \forall t \\ & \color{darkred}x_p \in \{0,1\}\end{align} \]

A solution for this, using our tiny data set, can look like:

----     90 VARIABLE x.L  use a lineup

ABCD 1,    ABDC 1,    BACD 1,    BADC 1,    CADB 1,    CBDA 1,    CDAB 1,    CDBA 1,    DACB 1,    DBCA 1,    DCAB 1
DCBA 1


----     90 VARIABLE z.L                   =           12  number of lineups


There is also another way of looking at this. We can see that all pairs of permutations \((p,p')\) that have a substring in common, have the following constraint:\[ \color{darkred}x_p +  \color{darkred}x_{p'}\le 1 \>\>\forall p,p'|\color{darkblue}{\mathit{pairs}}(p,p')\]

The set pairs is as follows:

----    103 PARAMETER npairs               =           24  number of pairs to forbid

----    104 SET pairs  pairs we need to forbid

ABCD.BCDA,    ABCD.DABC,    ABDC.BDCA,    ABDC.CABD,    ACBD.CBDA,    ACBD.DACB,    ACDB.BACD
ACDB.CDBA,    ADBC.CADB,    ADBC.DBCA,    ADCB.BADC,    ADCB.DCBA,    BACD.DBAC,    BADC.CBAD
BCAD.CADB,    BCAD.DBCA,    BCDA.CDAB,    BDAC.CBDA,    BDAC.DACB,    BDCA.DCAB,    CABD.DCAB
CBAD.DCBA,    CDAB.DABC,    CDBA.DBAC


Note that we only list each pair once. So we have ABCD.BCDA but not BCDA.ABCD. So the model becomes:

Model B
\[\begin{align}\max\> & \color{darkred}z = \sum_p \color{darkred}x_p \\ &\color{darkred}x_p+\color{darkred}x_{p'} \le 1&& \forall p,p'|\color{darkblue}{\mathit{pairs}}(p,p')\\ & \color{darkred}x_p \in \{0,1\}\end{align} \]


This is essentially a disaggregated version of model A. The solution is:

----    112 VARIABLE x.L  use a lineup

ABCD 1,    ABDC 1,    BACD 1,    BADC 1,    CADB 1,    CBDA 1,    CDAB 1,    CDBA 1,    DACB 1,    DBCA 1,    DCAB 1
DCBA 1


----    112 VARIABLE z.L                   =           12  number of lineups


This is the same solution. This is by accident: there are 32 different optimal solutions.

The problem with 8 soldiers is actually very large [2]. So, let's do 7 soldiers. This will give us 

----     65 PARAMETER np                   =         5040  card(p)
            PARAMETER nt                   =          210  card(t)
            PARAMETER npt                  =        25200  card(pt)

----    105 -------------- Model A --------------

----    105 VARIABLE x.L  use a lineup

ABCDEFG 1,    ACFDGBE 1,    ADBFCGE 1,    AEBCGDF 1,    AGCBFED 1,    AGEBDCF 1,    AGFBEDC 1,    BAEDGFC 1
BCFAGDE 1,    BDGCFEA 1,    BECDFGA 1,    BEGCADF 1,    BFAEGDC 1,    BGEFDAC 1,    CBDEGFA 1,    CEFABGD 1
CFGDABE 1,    CGAFDEB 1,    DAFGBCE 1,    DCEAGBF 1,    DCGFEBA 1,    DEABFGC 1,    DFACEBG 1,    DFECBAG 1
DGEAFBC 1,    EACBGFD 1,    EBFDCAG 1,    ECFBDAG 1,    EDFBACG 1,    EFCDGAB 1,    FADECGB 1,    FBGCEDA 1
FCBEADG 1,    FCEGBAD 1,    FDBGAEC 1,    FEGADCB 1,    FGECABD 1,    GACDBEF 1,    GBDFCAE 1,    GCDAEFB 1
GDBCAFE 1,    GEDBAFC 1


----    105 VARIABLE z.L                   =           42  number of lineups

----    105 PARAMETER report  results

               Model A

Variables     5041.000
Equations      211.000
Objective       42.000
Time            67.969
Nodes        29299.000
Iterations 1470790.000



----    120 PARAMETER npairs               =      1239840  number of pairs to forbid

----    133 -------------- Model B --------------

----    133 VARIABLE x.L  use a lineup

ABCDEFG 1,    ACDBFEG 1,    AFCGDEB 1,    AGFBCED 1,    BAECDFG 1,    BDGACFE 1,    BEGCDAF 1,    BFCEGDA 1
BGCFAED 1,    CAFGEDB 1,    CAGDFEB 1,    CBAFEDG 1,    CBGEFDA 1,    CDGEAFB 1,    CEBGAFD 1,    CFBADEG 1
DACGBEF 1,    DAEBCGF 1,    DBAGECF 1,    DCAEGBF 1,    DFBECGA 1,    DGFACBE 1,    EABGFCD 1,    EAGBCFD 1
EBDFAGC 1,    EBFADGC 1,    EDABFGC 1,    EDCFGAB 1,    EFCADBG 1,    EGADFCB 1,    EGFDBCA 1,    FBGDCEA 1
FDCGEBA 1,    FDGBACE 1,    FECBDAG 1,    FGBDECA 1,    FGDBEAC 1,    GAEFBDC 1,    GCABEDF 1,    GCBFDEA 1
GCEFABD 1,    GFEADCB 1


----    133 VARIABLE z.L                   =           42  number of lineups

----    133 PARAMETER report  results

               Model A     Model B

Variables     5041.000    5041.000
Equations      211.000 1239841.000
Objective       42.000      42.000
Time            67.969     263.516
Nodes        29299.000  180432.000
Iterations 1470790.000 8709140.000


The second model becomes very large due to the large number of pairs. We see this reflected in the solution times for the two models. In this case, the two models give different solutions. This is no surprise as there are multiple optimal solutions (I don't know how many for this case).

The case for 8 soldiers is difficult. We have a MIP with \(8!=40,320\) binary variables. This is very large. Rob Pratt was able to solve this within an hour using SAS. In [2], Gurobi needed multiple hours to solve this. The remaining question is: are there any better formulations that can solve this faster? Any good ways to exploit the massive symmetry in the model? 

References


  1. 8 soldiers lining up for the morning assembly, https://puzzling.stackexchange.com/questions/106030/8-soldiers-lining-up-for-the-morning-assembly/106047 Note the answer by Rob Pratt.
  2. Optimizing Gurobi model (PuLP) for a symmetric MIP with binary variables, https://stackoverflow.com/questions/65480166/optimizing-gurobi-model-pulp-for-a-symmetric-mip-with-binary-variables

Details of the GAMS code can be found below.


Appendix: GAMS model


The GAMS model has some interesting features. 

The generation of the permutations and handling the substrings is better handled by a programming language like Python than GAMS which has very poor support for strings. Here, I used a little Python fragment to initialize the set \(\color{darkblue}{pt}(p,t)\). The notation set pt(p<,t<) means that when we populate pt, automatically the sets p and t are also initialized. So this is three sets for the price of one. In the Python code, I use the wonderful package itertools, which has direct support for generating all permutations.

For the second model, we need to compute the set \(\color{darkblue}{\mathit{pairs}}(p,p')\). This we can do in a single line of GAMS code.

A macro is used to prevent blocks of identical code.


sets
  p
'permutations or line ups'
  t
'substrings of length 3'
  pt(p<,t<) 
'mapping (computed in Python)'
;

*-----------------------------------------------
* calculate set pt
*-----------------------------------------------

$onEmbeddedCode Python:
from itertools import permutations
M =
3            # length of substring
#S = "ABCDEFGH"  # too difficult
S =
"ABCDEFG"    # easier: use 7 soldiers

pt = []
for p in permutations(S):
   sp =
''.join(p)        # convert tuple to string
  
for k in range(len(S)-M+1):
      st = sp[k:k+M]     
# substring of length M
      pt.append((sp,st)) 
# store element p.t in list pt
gams.set(
"pt",pt)         # convert to GAMS set
$offEmbeddedCode pt


*-----------------------------------------------
* some statistics of the generated data
*-----------------------------------------------

display$(card(p)<1000) p,t;
option pt:0:0:7;
display$(card(pt)<1000) pt;

scalars
  np
"card(p)"
  nt
"card(t)"
  npt
"card(pt)"
;
option np:0,nt:0,npt:0;
np =
card(p);
nt =
card(t);
npt =
card(pt);
display np,nt,npt;

*-----------------------------------------------
* Reporting macro
*-----------------------------------------------

parameter report(*,*) 'results';
$macro results(m,name) \
report(
'Variables',name)  = m.numvar;  \
report(
'Equations',name)  = m.numequ;  \
report(
'Objective',name)  = m.objval;  \
report(
'Time',name)       = m.resusd;  \
report(
'Nodes',name)      = m.nodusd;  \
report(
'Iterations',name) = m.iterusd;

*-----------------------------------------------
* optimization model A
*-----------------------------------------------

binary variable x(p) 'use a lineup';
variable z 'number of lineups';

equations
    obj  
'objective'
    once 
'use triples once'
;


obj.. z =e=
sum(p,x(p));
once(t)..
sum(pt(p,t),x(p)) =l= 1;

model m /all/;
option optcr=0,threads=4,reslim=9999;
solve m maximizing z using mip;

results(m,
"Model A")
option x:0,z:0;

display
 
"-------------- Model A --------------",
  x.l,z.l,report;


*-----------------------------------------------
* optimization model B
*-----------------------------------------------
alias (p,pp);
set pairs(p,pp) 'pairs we need to forbid';
pairs(p,pp)$(
ord(p)<ord(pp)) = sum(t$(pt(p,t) and pt(pp,t)),1);

scalar npairs "number of pairs to forbid";
npairs =
card(pairs);

option npairs:0,pairs:0:0:7;
display npairs;
display$(npairs<1000) pairs;
abort$(npairs>5e6) "Too many constraints";

equation oneInPair(p,pp);
oneInPair(pairs(p,pp))..   x(p) + x(pp) =l= 1;

model m2 /obj,oneInPair/;
solve m2 maximizing z using mip;

results(m2,
"Model B")

display
 
"-------------- Model B --------------",
  x.l,z.l,report;



No comments:

Post a Comment