Monday, December 1, 2025

Sorting: minimize number of swaps

In this post, I want to delve further into sorting. In a question on or.stackexchange.com [1], the subject was minimizing the number of swaps in sorting algorithms. A swap, i.e., an interchange of two items, is a basic operation in sorting. We usually don't pay much attention to this. First, we assume swaps are cheap. If they are not, we can sort not the real data (which can be complex, and thus more complicated and expensive to swap), but rather pointers to the data or keys. But what if we really have to reorder physical things? Then the number of swaps is a more meaningful thing. Can we minimize the work to reorder say \(n\) boxes (or containers to make it more dramatic) by minimizing the number of swaps (assuming interchanging the position of two boxes is the way reordering works).

Number of swaps in Quicksort

Let's say the problem is to sort the following list of 10 items: BDADDDBCCC. Using a simple quicksort algorithm [2,3], this task takes 28 swaps:

BDADDDBCCC -> ABBCCCDDDD #swaps:28
ABBCCCDDDD -> ABBCCCDDDD #swaps:54

A well-known property of many quicksort algorithms is that sorting an already sorted array is expensive. Here we needed 54 swaps for doing that!
 
From my perspective, these swap counts are surprisingly large.

Python code
# measure number of swaps in Quicksort
# ------------------------------------
# simple quicksort from https://www.w3schools.com/dsa/dsa_algo_quicksort.php
# I added a swap counter.
#
# result:
#  BDADDDBCCC -> ABBCCCDDDD #swaps:28
#  ABBCCCDDDD -> ABBCCCDDDD #swaps:54

# string to sort
input = "BDADDDBCCC"

def partition(array, low, high):
    swaps = 0
    pivot = array[high]
    i = low - 1

    for j in range(low, high):
        if array[j] <= pivot:
            i += 1
            array[i], array[j] = array[j], array[i]
            swaps += 1

    array[i+1], array[high] = array[high], array[i+1]
    swaps += 1
    return swaps,i+1

def quicksort(array, low=0, high=None): 
    if high is None:
        high = len(array) - 1

    if low < high:
        n1,pivot_index = partition(array, low, high)
        n2 = quicksort(array, low, pivot_index-1)
        n3 = quicksort(array, pivot_index+1, high)
        return n1+n2+n3
    else:
        return 0

def list2str(L):
    return ''.join(L)

# sort characters in string 
# convert list of chars <-> string
def strsort(s):
    L = list(s)
    n = quicksort(L)
    s2 = list2str(L)
    return s2,n

# sort input 
s = input  
t,n=strsort(s)
print(f"{s} -> {t} #swaps:{n}")

# sort again (already sorted)
t2,n=strsort(t)
print(f"{t} -> {t2} #swaps:{n}")


The question comes to mind: can we go from BDADDDBCCC to ABBCCCDDDD using fewer swaps? There are other sorting algorithms that are doing better: selection sort [4] and cycle sort [5]. Here, I look at some formal optimization methods.


Minimum number of swaps: shortest path network approach

We can form a large, sparse network where the nodes are all possible unique configurations. 

To compute all such configurations, we want to avoid generating duplicates. E.g. when considering AAB, the collection of nodes should be AAB, ABA, BAA. The Python function itertools.permutations gives: AAB, ABA, AAB, ABA, BAA, BAA. Instead, we should use sympy.utilities.iterables.multiset_permutations (not in itertools, as predictability is just boring). This will produce:  AAB, ABA, BAA.

For our input BDADDDBCCCwe generate 12,600 nodes.

The arcs are formed by swapping two letters (only if they are different). In total, we have 441,000 arcs. The source node is BDADDDBCCCand the sink node is ABBCCCDDDD. This network is not small. It is quite amazing that our little 10-character string can yield such a large graph.

I used an LP model to solve this [6]. The network creates 12,600 constraints and 441,000 variables. This  is a very easy LP, so it solves very fast. Typical solution times are less than a second. Obviously, this can also be solved with a network code or Dijkstra's algorithm.

An optimal solution is:

    step0.BDADDDBCCC
    step1.BDADDCBCCD
    step2.BDADCCBCDD
    step3.ADBDCCBCDD
    step4.ABBDCCDCDD
    step5.ABBCCCDDDD

The swapped items are highlighted. So we only need 5 swaps! If we had to physically move containers using a forklift truck to put them in order, this may help.

GAMS LP Model

$onText

 

   Shortest Path: minimize number of swaps when sorting

  

$offText

 

*--------------------------------------------------------

* data: from data file

*--------------------------------------------------------

 

set

   nodes 'all possible unique orderings'

   arcs(nodes,nodes) 'formed by a swap'

;

 

$offlisting

$include networkdata.inc

 

sets

   start(nodes)  'initial configuration' /BDADDDBCCC/

   final(nodes)  'sorted configuration'  /ABBCCCDDDD/

;

 

alias (n,n1,n2,nodes);

 

*--------------------------------------------------------

* summary of network

*--------------------------------------------------------

 

parameter counts(*) 'nodes/arcs';

counts('nodes') = card(nodes);

counts('arcs') = card(arcs);

option counts:0;

display counts;

 

*--------------------------------------------------------

* shortest path LP

*--------------------------------------------------------

 

positive variables f(n1,n2) 'flow along arcs';

 

variable 'objective';

 

Equations

    obj       'objective'

    nodbal(n) 'node balance'

;

 

parameter

   supply(n) 'exogenous inflow'

   demand(n) 'exogenous outflow'

;

 

supply(start) = 1;

demand(final) = 1;

 

obj.. z =e= sum(arcs,f(arcs));

 

nodbal(n)..  sum(arcs(n1,n),f(arcs)) + supply(n) =e= sum(arcs(n,n1),f(arcs)) + demand(n);

 

model spath /all/;

option limrow=0,limcol=0,solprint=off;

solve spath minimizing z using lp;

 

display f.l;

 

*--------------------------------------------------------

* reporting: path

*--------------------------------------------------------

 

sets

    step 'for reporting' /step0*step50/

    visit(step,nodes) 'path'

;

singleton set curr(nodes) 'current node';

curr(n) = start(n);

loop(step$card(curr),

   visit(step,curr) = yes;

   curr(n2) = f.l(curr,n2)>0.5;

);

option visit:0:0:1;

display start,final,visit;

 

Note: the data file can be retrieved from [7]. Sometimes I like to generate plain old text files to hold data. I know this is very old-fashioned, but I get a much better sense of the data by just being able to look at it in an editor. 

In the above approach, we assumed we knew the optimal ordering (e.g., by running a quicksort algorithm). In the next model, I'll drop that assumption. 

A MIP Model


We can also try to formulate a mixed-integer programming model. This model works with numeric values instead of labels A,B,... so we relabel: A=1, B=2,... The values are integers here, but the model allows for fractional values. We don't need to know the final ordering: the model finds this for us.

I use the indices \(i,j\) to indicate items (so we have \(i,j\in\{1,2,\dots,n\}\) where \(n=10\) for our test data). The index \(k\) indicates the iteration number (i.e., how many swaps have been performed).

The main variables are:
  • \(\color{darkred}x_{k,i}\): the value of the i-th item at iteration \(k\),
  • \(\color{darkred}s_{k,i}\in\{0,1\}\): indicates if item \(i\) has stayed the same (i.e., was not swapped). In each iteration, 2 items are swapped, and \(n-2\) items remain the same. Detail: we allow for the model to select 2 variables with the same value to swap. That is not very useful, and the objective will try to prevent this from happening. Note that in the shortest-path approach, we explicitly avoided these unnecessary swaps.
  • \(\color{darkred}d_k\in\{0,1\}\) indicates we are done. We found a sorted configuration in the previous iteration. Once we are done, the remaining iterations will keep the same ordering.
The model looks like:

High-level MIP Model
\[\begin{align}\min&\sum_k \left(1-\color{darkred}d_k\right) \\ & \color{darkred}x_{0,i} = \color{darkblue}{\mathit{init}}_i \\ & \color{darkred}d_k=0 \implies \sum_i \color{darkred}s_{k,i}=n-2 \\ & \color{darkred}d_k=1 \implies \sum_i \color{darkred}s_{k,i}=n \\ & \color{darkred}s_{k,i}=1 \implies \color{darkred}x_{k,i} = \color{darkred}x_{k-1,i} \\ & \color{darkred}s_{k,i}=0\ \mathbf{ and }\ \color{darkred}s_{k,j}=0 \implies \color{darkred}x_{k,i} = \color{darkred}x_{k-1,j} && \forall i\ne j \\ & \color{darkred}d_k=1 \implies \color{darkred}x_{k-1,i} \ge \color{darkred}x_{k-1,i-1} &\\ &\color{darkred}s_{k,i},\color{darkred}d_k \in\{0,1\} \end{align}\]

Notes:

  • The objective minimizes the number of iterations \(k\) (i.e., swaps) required to obtain a proper ordering. We can also use \(\max \sum_k \color{darkred}d_k\), but that makes the solver progress log a bit more challenging to interpret.
  • We initialize \(\color{darkred}x_{0,i} = \color{darkblue}{\mathit{init}}_i\). This will allow us to use lags \(\color{darkred}x_{k-1,i}\) in the rest of the model without complications.
  • The swap equation is for all \(i \ne j\). This means it yields an equality for both \((i,j)\) and \((j,i)\). That is exactly what we need for a swap.
  • Most, but not all, solvers and tools support implications (a.k.a. indicator constraints).   
  • The values are floating point numbers here. So A=1.0, B=2.0,... 

The actual MIP model I used was:


MIP Model Implementation
\[\begin{align}\min&\sum_{k\ge 1} \left(1-\color{darkred}d_k\right) \\ & \color{darkred}x_{0,i} = \color{darkblue}{\mathit{init}}_i \\ & \sum_i \color{darkred}s_{k,i}=n-2 \cdot (1-\color{darkred}d_k)  && k \ge 1\\  & \color{darkred}x_{k,i} \le \color{darkred}x_{k-1,i} + \color{darkblue}M \cdot (1-\color{darkred}s_{k,i}) && k \ge 1 \\ & \color{darkred}x_{k,i} \ge \color{darkred}x_{k-1,i} - \color{darkblue}M \cdot (1-\color{darkred}s_{k,i}) && k \ge 1 \\ & \color{darkred}x_{k,i} \le \color{darkred}x_{k-1,j} + \color{darkblue}M\cdot \left(\color{darkred}s_{k,i} +\color{darkred}s_{k,j}\right) && k\ge 1, i\ne j\\ & \color{darkred}x_{k,i} \ge \color{darkred}x_{k-1,j} - \color{darkblue}M\cdot \left(\color{darkred}s_{k,i} +\color{darkred}s_{k,j}\right) && k\ge 1, i\ne j \\ &  \color{darkred}x_{k-1,i} \ge \color{darkred}x_{k-1,i-1} - \color{darkblue}M \cdot \left(1-\color{darkred}d_k\right) && k\ge 1, i \ge 2 \\ & \color{darkred}d_k \ge\color{darkred}d_{k-1} && k\ge 1 \\ & \sum_i \color{darkred}x_{k,i} = \sum_i \color{darkblue}{\mathit{init}}_i  \\ &\color{darkblue}M = \max_i \color{darkblue}{\mathit{init}}_i - \min_i \color{darkblue}{\mathit{init}}_i \\ & \color{darkred}s_{k,i},\color{darkred}d_k \in\{0,1\} \\ & \color{darkred}x_{i,k} \in \left[\min_i \color{darkblue}{\mathit{init}}_i,\max_i \color{darkblue}{\mathit{init}}_i \right] \end{align}\]

    Notes:

    • The indicator constraints have been replaced by big-M constraints. This model can be solved with any MIP solver.
    • The model size is: 2,122 constraints and 222 variables (111 binary variables).
    • I added a few extra, optional constraints. They help a bit: solution times went from 1.22 seconds to 0.76 seconds and node counts from 5,861 to 4,922. (This was with Cplex. For slower solvers the impact is bigger.)
    • We have to choose the dimensionality of set k a bit carefully. If too small, the model will be infeasible. If too large, we waste performance.


    The solution looks like:

    ----     99 PARAMETER trace  
    
                item1       item2       item3       item4       item5       item6       item7       item8       item9      item10
    
    iter0       2.000       4.000       1.000       4.000       4.000       4.000       2.000       3.000       3.000       3.000
    iter1       1.000       4.000       2.000       4.000       4.000       4.000       2.000       3.000       3.000       3.000
    iter2       1.000       2.000       2.000       4.000       4.000       4.000       4.000       3.000       3.000       3.000
    iter3       1.000       2.000       2.000       4.000       3.000       4.000       4.000       4.000       3.000       3.000
    iter4       1.000       2.000       2.000       3.000       3.000       4.000       4.000       4.000       4.000       3.000
    iter5       1.000       2.000       2.000       3.000       3.000       3.000       4.000       4.000       4.000       4.000
    


    As before, we need 5 swaps (although the actual sequencing is different).


    GAMS Model

    $onText

     

      MIP model

     

      Find a way to reorganize the list

          BDADDDBCCC → ABBCCCDDDD

      (sorting) using as few swaps as

      passible. 

     

      We can use a numerical version:

          2414442333 → 1223334444 

     

    $offText

     

     

    *---------------------------------------------------------------

    data

    *---------------------------------------------------------------

     

    Set

      i 'items' /item1*item10/

      'max iterations' /iter0*iter10/

    ;

     

    alias(i,j);

     

    Parameter

       init(i'initial ordering 2414442333' /

          item3 1

          (item1, item7) 2

          (item8*item10) 3

          (item2, item4*item6) 4

       /;

      

    display init;

     

    *---------------------------------------------------------------

    linear MIP model

    *---------------------------------------------------------------

     

    variables

        x(k,i'values'

        itcount 'to minimize'

       

    ;

     

    x.lo(k,i) = smin(j,init(j));

    x.up(k,i) = smax(j,init(j));

    x.fx('iter0',i) = init(i);

     

     

    Scalar 'big-M';

    M = smax(i,init(i))-smin(i,init(i));

     

     

    binary Variables

       s(k,i's(k,i)=1 : x(k,i) stays the same'

       d(k)   'd(k)=1 : we are done. No change needed anymore.'

    ;  

    s.fx('iter0',i) = 0;

     

     

    equations

      count_same(k) 'number of x(k,i) that should stay the same'

      calc_itcount  'calc iteration count (obj)'

      same1(k,i)    'big-M constraint for x(k,i) staying the same'        

      same2(k,i)    'big-M constraint for x(k,i) staying the same'

      swap1(k,i,j)  'big-M constraint for x(k,i) being swapped'

      swap2(k,i,j)  'big-M constraint for x(k,i) being swapped'

      isSorted(k,i'big-M constraint for detecting x(k,i) is sorted'

     

    optional constraints

      order(k)      'optional: d(k) is ordered'

      extra(k)      'optional: x always adds up to the same number'

    ;

     

    counts

    done(k)=0 => sum(i, same(k,i)) = n-2

    done(k)=1 => sum(i, same(k,i)) = n

    count_same(k)$(ord(k)>1).. sum(i, s(k,i)) =e= card(i)-2*(1-d(k));

     

    same

    same(k,i)=1 ==> x(k,i) = x(k-1,i)

    same1(k,i)$(ord(k)>1)..x(k,i) =l= x(k-1,i) + M*(1-s(k,i));

    same2(k,i)$(ord(k)>1)..x(k,i) =g= x(k-1,i) - M*(1-s(k,i));

     

    swap

    same(k,i)=0 & same(k,j)=0 ==> x(k,i) = x(k-1,j), x(k,j) = x(k-1,i)

    we will allow useless swaps

    swap1(k,i,j)$(ord(i)<>ord(j) and ord(k)>1).. x(k,i) =l= x(k-1,j) + M*(s(k,i)+s(k,j));

    swap2(k,i,j)$(ord(i)<>ord(j) and ord(k)>1).. x(k,i) =g= x(k-1,j) - M*(s(k,i)+s(k,j));

     

    done

    isSorted(k,i)$(ord(i)>1 and ord(k)>1).. x(k-1,i) =g= x(k-1,i-1) - M*(1-d(k));

     

    optional constraints (they help performance)

    order(k)$(ord(k)>1).. d(k) =g= d(k-1);

     

    extra(k).. sum(i, x(k,i)) =e= sum(i,init(i));

     

    obj

    calc_itcount.. itcount =e= sum(k$(ord(k)>1), 1-d(k));

     

    model sort /all/;

     

    *---------------------------------------------------------------

    solve

    *---------------------------------------------------------------

     

    solve sort minimizing itcount using mip;

    display x.ls.ld.litcount.l;

     

    make a little report

    parameter trace(k,*);

    trace(k,i)$(d.l(k)<0.5) = x.l(k,i);

    display trace;

     


    Conclusion and extensions


    We have seen that QuickSort is not very efficient in terms of the number of swaps, when we sort: BDADDDBCCC -> ABBCCCDDDD. To find the minimum number of swap operations needed, I formulated two different optimization problem: a shortest path network model and a MIP model. Both agree: we can sort this using just 5 swaps.

    Some open questions:

    • Can we use a version of Cycle Sort (or another sort algorithm) that gives us the optimal sorting strategy for this? Warning: some versions are not using the concept of swaps. I tried my example on some different versions (including AI generated code), but was unable to achieve our optimal solution.
    • Can we find the initial configuration that requires the most swaps? That is like the "longest shortest distance". We should be able to find this with Dijkstra (the longest distance is 6). I am not immediately sure how to formulate a direct MIP model for this. 


    References

    1. Sorting algorithms that optimise for changes/swaps/moves, https://or.stackexchange.com/questions/13387/sorting-algorithms-that-optimise-for-changes-swaps-moves
    2. Quicksort, https://en.wikipedia.org/wiki/Quicksort
    3. DSA Quicksort, https://www.w3schools.com/dsa/dsa_algo_quicksort.php
    4. Selection Sort, https://www.geeksforgeeks.org/dsa/selection-sort-algorithm-2/
    5. Cycle Sort, https://www.geeksforgeeks.org/dsa/cycle-sort/
    6. A network model: Pyomo vs GAMS, https://yetanothermathprogrammingconsultant.blogspot.com/2021/08/a-network-model-pyomo-vs-gams.html
    7. networkdata.inc, https://raw.githubusercontent.com/aopt/OptimizationModels/refs/heads/main/Sorting%20and%20Swaps/networkdata.inc

    No comments:

    Post a Comment