Yet Another Math Programming Consultant: Sorting: minimize number of swaps

In this post, I want to delve further into sorting. In a question on or.stackexchange.com [1], the subject was minimizing the number of swaps in sorting algorithms. A swap, i.e., an interchange of two items, is a basic operation in sorting. We usually don't pay much attention to this. First, we assume swaps are cheap. If they are not, we can sort not the real data (which can be complex, and thus more complicated and expensive to swap), but rather pointers to the data or keys. But what if we really have to reorder bulky physical things? Then the number of swaps is a more meaningful concept. Can we minimize the work to reorder say $n$ boxes (or containers to make it more dramatic) by minimizing the number of swaps (assuming interchanging the position of two boxes is the way reordering works).

Number of swaps in Quicksort

Let's say the problem is to sort the following list of 10 items: BDADDDBCCC. Using a simple quicksort algorithm [2,3], this task takes 28 swaps:

BDADDDBCCC -> ABBCCCDDDD #swaps:28
ABBCCCDDDD -> ABBCCCDDDD #swaps:54

A well-known property of many quicksort algorithms is that sorting an already sorted array is expensive. Here we needed 54 swaps for doing that!

From my perspective, these swap counts are surprisingly large.

Python code

# measure number of swaps in Quicksort
# ------------------------------------
# simple quicksort from https://www.w3schools.com/dsa/dsa_algo_quicksort.php
# I added a swap counter.
#
# result:
#  BDADDDBCCC -> ABBCCCDDDD #swaps:28
#  ABBCCCDDDD -> ABBCCCDDDD #swaps:54

# string to sort
input = "BDADDDBCCC"

def partition(array, low, high):
    swaps = 0
    pivot = array[high]
    i = low - 1

    for j in range(low, high):
        if array[j] <= pivot:
            i += 1
            array[i], array[j] = array[j], array[i]
            swaps += 1

    array[i+1], array[high] = array[high], array[i+1]
    swaps += 1
    return swaps,i+1

def quicksort(array, low=0, high=None): 
    if high is None:
        high = len(array) - 1

    if low < high:
        n1,pivot_index = partition(array, low, high)
        n2 = quicksort(array, low, pivot_index-1)
        n3 = quicksort(array, pivot_index+1, high)
        return n1+n2+n3
    else:
        return 0

def list2str(L):
    return ''.join(L)

# sort characters in string 
# convert list of chars <-> string
def strsort(s):
    L = list(s)
    n = quicksort(L)
    s2 = list2str(L)
    return s2,n

# sort input 
s = input  
t,n=strsort(s)
print(f"{s} -> {t} #swaps:{n}")

# sort again (already sorted)
t2,n=strsort(t)
print(f"{t} -> {t2} #swaps:{n}")

The question comes to mind: can we go from BDADDDBCCC to ABBCCCDDDD using fewer swaps? There are other sorting algorithms that are doing better: selection sort [4] and cycle sort [5]. Here, I look at some formal optimization methods.

Minimum number of swaps: shortest path network approach

We can form a large, sparse network where the nodes are all possible unique configurations.

To compute all such configurations, we want to avoid generating duplicates. E.g. when considering AAB, the collection of nodes should be AAB, ABA, BAA. The Python function itertools.permutations gives: AAB, ABA, AAB, ABA, BAA, BAA. Instead, we should use sympy.utilities.iterables.multiset_permutations (not in itertools, as predictability is just boring). This will produce: AAB, ABA, BAA.

For our input BDADDDBCCC, we generate 12,600 nodes.

The arcs are formed by swapping two letters (only if they are different). In total, we have 441,000 arcs. The source node is BDADDDBCCC, and the sink node is ABBCCCDDDD. This network is not small. It is quite amazing that our little 10-character string can yield such a large graph.

I used an LP model to solve this [7]. The network creates 12,600 constraints and 441,000 variables. This is a very easy LP, so it solves very fast. Typical solution times are less than a second. Obviously, this can also be solved with a network code or Dijkstra's algorithm.

An optimal solution is:

step0.BDADDDBCCC
step1.BDADDCBCCD
step2.BDADCCBCDD
step3.ADBDCCBCDD
step4.ABBDCCDCDD
step5.ABBCCCDDDD

The swapped items are highlighted. So we only need 5 swaps! If we had to physically move containers using a forklift truck to put them in order, this may help.

GAMS LP Model

$onText

Shortest Path: minimize number of swaps when sorting

$offText

*--------------------------------------------------------

* data: from data file

*--------------------------------------------------------

set

nodes 'all possible unique orderings'

arcs(nodes,nodes) 'formed by a swap'

;

$offlisting

$include networkdata.inc

sets

start(nodes) 'initial configuration' /BDADDDBCCC/

final(nodes) 'sorted configuration' /ABBCCCDDDD/

;

alias (n,n1,n2,nodes);

*--------------------------------------------------------

* summary of network

*--------------------------------------------------------

parameter counts(*) 'nodes/arcs';

counts('nodes') = card(nodes);

counts('arcs') = card(arcs);

option counts:0;

display counts;

*--------------------------------------------------------

* shortest path LP

*--------------------------------------------------------

positive variables f(n1,n2) 'flow along arcs';

variable z 'objective';

Equations

obj 'objective'

nodbal(n) 'node balance'

;

parameter

supply(n) 'exogenous inflow'

demand(n) 'exogenous outflow'

;

supply(start) = 1;

demand(final) = 1;

obj.. z =e= sum(arcs,f(arcs));

nodbal(n).. sum(arcs(n1,n),f(arcs)) + supply(n) =e= sum(arcs(n,n1),f(arcs)) + demand(n);

model spath /all/;

option limrow=0,limcol=0,solprint=off;

solve spath minimizing z using lp;

display f.l;

*--------------------------------------------------------

* reporting: path

*--------------------------------------------------------

sets

step 'for reporting' /step0*step50/

visit(step,nodes) 'path'

;

singleton set curr(nodes) 'current node';

curr(n) = start(n);

loop(step$card(curr),

visit(step,curr) = yes;

curr(n2) = f.l(curr,n2)>0.5;

);

option visit:0:0:1;

display start,final,visit;

Note: the data file can be retrieved from [8]. Sometimes I like to generate plain old text files to hold data. I know this is very old-fashioned, but I get a much better sense of the data by just being able to look at it in an editor.

In the above approach, we assumed we knew the optimal ordering (e.g., by running a quicksort algorithm). In the next model, I'll drop that assumption.

We can also do this ourselves using a bit of Python code. This is a BFS (Breadth First Search) algorithm, where we calculate the set of neighbors when needed instead of all in advance.

BFS Code

from collections import deque

start = "BDADDDBCCC"
goal  = "ABBCCCDDDD"

def neighbors(s):
    i = 0
    while i < len(s) and s[i] == goal[i]:
        i += 1
    if i == len(s):
        return
    
    for j in range(i+1, len(s)):
        if s[j] == goal[i] and s[j] != goal[j]:
            t = list(s)
            t[i], t[j] = t[j], t[i]
            yield ''.join(t), (i, j)

def min_swaps(start, goal):
    q = deque([(start, [])])
    seen = {start}

    while q:
        s, path = q.popleft()

        if s == goal:
            return path

        for t, swap in neighbors(s):
            if t not in seen:
                seen.add(t)
                q.append((t, path + [swap]))

path = min_swaps(start, goal)

print("Minimum # of swaps:", len(path))
s = start
for step,(i,j) in enumerate(path,1):
    t = list(s)
    t[i],t[j] = t[j],t[i]
    prev = s
    s = ''.join(t)
    print(f"{step}. {prev} -> {s}")

A MIP Model

We can also try to formulate a mixed-integer programming model. This model works with numeric values instead of labels A,B,... so we relabel: A=1, B=2,... The values are integers here, but the model allows for fractional values. We don't need to know the final ordering: the model finds this for us.

I use the indices $i,j$ to indicate items (so we have $i,j\in\{1,2,\dots,n\}$ where $n=10$ for our test data). The index $k$ indicates the iteration number (i.e., how many swaps have been performed).

The main variables are:

$\color{darkred}x_{k,i}$: the value of the i-th item at iteration $k$,
$\color{darkred}s_{k,i}\in\{0,1\}$: indicates if item $i$ has stayed the same (i.e., was not swapped). In each iteration, 2 items are swapped, and $n-2$ items remain the same. Detail: we allow for the model to select 2 variables with the same value to swap. That is not very useful, and the objective will try to prevent this from happening. Note that in the shortest-path approach, we explicitly avoided these unnecessary swaps.
$\color{darkred}d_k\in\{0,1\}$ indicates we are done. We found a sorted configuration in the previous iteration. Once we are done, the remaining iterations will keep the same ordering.

The model looks like:

High-level MIP Model
\[\begin{align}\min&\sum_k \left(1-\color{darkred}d_k\right) \\ & \color{darkred}x_{0,i} = \color{darkblue}{\mathit{init}}_i \\ & \color{darkred}d_k=0 \implies \sum_i \color{darkred}s_{k,i}=n-2 \\ & \color{darkred}d_k=1 \implies \sum_i \color{darkred}s_{k,i}=n \\ & \color{darkred}s_{k,i}=1 \implies \color{darkred}x_{k,i} = \color{darkred}x_{k-1,i} \\ & \color{darkred}s_{k,i}=0\ \mathbf{ and }\ \color{darkred}s_{k,j}=0 \implies \color{darkred}x_{k,i} = \color{darkred}x_{k-1,j} && \forall i\ne j \\ & \color{darkred}d_k=1 \implies \color{darkred}x_{k-1,i} \ge \color{darkred}x_{k-1,i-1} &\\ &\color{darkred}s_{k,i},\color{darkred}d_k \in\{0,1\} \end{align}\]

High-level MIP Model

\[\begin{align}\min&\sum_k \left(1-\color{darkred}d_k\right) \\ & \color{darkred}x_{0,i} = \color{darkblue}{\mathit{init}}_i \\ & \color{darkred}d_k=0 \implies \sum_i \color{darkred}s_{k,i}=n-2 \\ & \color{darkred}d_k=1 \implies \sum_i \color{darkred}s_{k,i}=n \\ & \color{darkred}s_{k,i}=1 \implies \color{darkred}x_{k,i} = \color{darkred}x_{k-1,i} \\ & \color{darkred}s_{k,i}=0\ \mathbf{ and }\ \color{darkred}s_{k,j}=0 \implies \color{darkred}x_{k,i} = \color{darkred}x_{k-1,j} && \forall i\ne j \\ & \color{darkred}d_k=1 \implies \color{darkred}x_{k-1,i} \ge \color{darkred}x_{k-1,i-1} &\\ &\color{darkred}s_{k,i},\color{darkred}d_k \in\{0,1\} \end{align}\]

Notes:

The objective minimizes the number of iterations $k$ (i.e., swaps) required to obtain a proper ordering. We can also use $\max \sum_k \color{darkred}d_k$, but that makes the solver progress log a bit more challenging to interpret.
We initialize $\color{darkred}x_{0,i} = \color{darkblue}{\mathit{init}}_i$. This will allow us to use lags $\color{darkred}x_{k-1,i}$ in the rest of the model without complications.
The swap equation is for all $i \ne j$. This means it yields an equality for both $(i,j)$ and $(j,i)$. That is exactly what we need for a swap.
Most, but not all, solvers and tools support implications (a.k.a. indicator constraints).
The values are floating point numbers here. So A=1.0, B=2.0,...

The actual MIP model I used was:

MIP Model Implementation
\[\begin{align}\min&\sum_{k\ge 1} \left(1-\color{darkred}d_k\right) \\ & \color{darkred}x_{0,i} = \color{darkblue}{\mathit{init}}_i \\ & \sum_i \color{darkred}s_{k,i}=n-2 \cdot (1-\color{darkred}d_k) && k \ge 1\\ & \color{darkred}x_{k,i} \le \color{darkred}x_{k-1,i} + \color{darkblue}M \cdot (1-\color{darkred}s_{k,i}) && k \ge 1 \\ & \color{darkred}x_{k,i} \ge \color{darkred}x_{k-1,i} - \color{darkblue}M \cdot (1-\color{darkred}s_{k,i}) && k \ge 1 \\ & \color{darkred}x_{k,i} \le \color{darkred}x_{k-1,j} + \color{darkblue}M\cdot \left(\color{darkred}s_{k,i} +\color{darkred}s_{k,j}\right) && k\ge 1, i\ne j\\ & \color{darkred}x_{k,i} \ge \color{darkred}x_{k-1,j} - \color{darkblue}M\cdot \left(\color{darkred}s_{k,i} +\color{darkred}s_{k,j}\right) && k\ge 1, i\ne j \\ & \color{darkred}x_{k-1,i} \ge \color{darkred}x_{k-1,i-1} - \color{darkblue}M \cdot \left(1-\color{darkred}d_k\right) && k\ge 1, i \ge 2 \\ & \color{darkred}d_k \ge\color{darkred}d_{k-1} && k\ge 1 \\ & \sum_i \color{darkred}x_{k,i} = \sum_i \color{darkblue}{\mathit{init}}_i \\ &\color{darkblue}M = \max_i \color{darkblue}{\mathit{init}}_i - \min_i \color{darkblue}{\mathit{init}}_i \\ & \color{darkred}s_{k,i},\color{darkred}d_k \in\{0,1\} \\ & \color{darkred}x_{i,k} \in \left[\min_i \color{darkblue}{\mathit{init}}_i,\max_i \color{darkblue}{\mathit{init}}_i \right] \end{align}\]

MIP Model Implementation

\[\begin{align}\min&\sum_{k\ge 1} \left(1-\color{darkred}d_k\right) \\ & \color{darkred}x_{0,i} = \color{darkblue}{\mathit{init}}_i \\ & \sum_i \color{darkred}s_{k,i}=n-2 \cdot (1-\color{darkred}d_k) && k \ge 1\\ & \color{darkred}x_{k,i} \le \color{darkred}x_{k-1,i} + \color{darkblue}M \cdot (1-\color{darkred}s_{k,i}) && k \ge 1 \\ & \color{darkred}x_{k,i} \ge \color{darkred}x_{k-1,i} - \color{darkblue}M \cdot (1-\color{darkred}s_{k,i}) && k \ge 1 \\ & \color{darkred}x_{k,i} \le \color{darkred}x_{k-1,j} + \color{darkblue}M\cdot \left(\color{darkred}s_{k,i} +\color{darkred}s_{k,j}\right) && k\ge 1, i\ne j\\ & \color{darkred}x_{k,i} \ge \color{darkred}x_{k-1,j} - \color{darkblue}M\cdot \left(\color{darkred}s_{k,i} +\color{darkred}s_{k,j}\right) && k\ge 1, i\ne j \\ & \color{darkred}x_{k-1,i} \ge \color{darkred}x_{k-1,i-1} - \color{darkblue}M \cdot \left(1-\color{darkred}d_k\right) && k\ge 1, i \ge 2 \\ & \color{darkred}d_k \ge\color{darkred}d_{k-1} && k\ge 1 \\ & \sum_i \color{darkred}x_{k,i} = \sum_i \color{darkblue}{\mathit{init}}_i \\ &\color{darkblue}M = \max_i \color{darkblue}{\mathit{init}}_i - \min_i \color{darkblue}{\mathit{init}}_i \\ & \color{darkred}s_{k,i},\color{darkred}d_k \in\{0,1\} \\ & \color{darkred}x_{i,k} \in \left[\min_i \color{darkblue}{\mathit{init}}_i,\max_i \color{darkblue}{\mathit{init}}_i \right] \end{align}\]

Notes:

The indicator constraints have been replaced by big-M constraints. This model can be solved with any MIP solver.
The model size is: 2,122 constraints and 222 variables (111 binary variables).
I added a few extra, optional constraints. They help a bit: solution times went from 1.22 seconds to 0.76 seconds and node counts from 5,861 to 4,922. (This was with Cplex. For slower solvers the impact is bigger.)
We have to choose the dimensionality of set k a bit carefully. If too small, the model will be infeasible. If too large, we waste performance.

The solution looks like:

----     99 PARAMETER trace  

            item1       item2       item3       item4       item5       item6       item7       item8       item9      item10

iter0       2.000       4.000       1.000       4.000       4.000       4.000       2.000       3.000       3.000       3.000
iter1       1.000       4.000       2.000       4.000       4.000       4.000       2.000       3.000       3.000       3.000
iter2       1.000       2.000       2.000       4.000       4.000       4.000       4.000       3.000       3.000       3.000
iter3       1.000       2.000       2.000       4.000       3.000       4.000       4.000       4.000       3.000       3.000
iter4       1.000       2.000       2.000       3.000       3.000       4.000       4.000       4.000       4.000       3.000
iter5       1.000       2.000       2.000       3.000       3.000       3.000       4.000       4.000       4.000       4.000

As before, we need 5 swaps (although the actual sequencing is different).

GAMS Model

$onText

MIP model

Find a way to reorganize the list

BDADDDBCCC → ABBCCCDDDD

(sorting) using as few swaps as

passible.

We can use a numerical version:

2414442333 → 1223334444

$offText

*---------------------------------------------------------------

* data

*---------------------------------------------------------------

Set

i 'items' /item1*item10/

k 'max iterations' /iter0*iter10/

;

alias(i,j);

Parameter

init(i) 'initial ordering 2414442333' /

item3 1

(item1, item7) 2

(item8*item10) 3

(item2, item4*item6) 4

display init;

*---------------------------------------------------------------

* linear MIP model

*---------------------------------------------------------------

variables

x(k,i) 'values'

itcount 'to minimize'

;

x.lo(k,i) = smin(j,init(j));

x.up(k,i) = smax(j,init(j));

x.fx('iter0',i) = init(i);

Scalar M 'big-M';

M = smax(i,init(i))-smin(i,init(i));

binary Variables

s(k,i) 's(k,i)=1 : x(k,i) stays the same'

d(k) 'd(k)=1 : we are done. No change needed anymore.'

;

s.fx('iter0',i) = 0;

equations

count_same(k) 'number of x(k,i) that should stay the same'

calc_itcount 'calc iteration count (obj)'

same1(k,i) 'big-M constraint for x(k,i) staying the same'

same2(k,i) 'big-M constraint for x(k,i) staying the same'

swap1(k,i,j) 'big-M constraint for x(k,i) being swapped'

swap2(k,i,j) 'big-M constraint for x(k,i) being swapped'

isSorted(k,i) 'big-M constraint for detecting x(k,i) is sorted'

* optional constraints

order(k) 'optional: d(k) is ordered'

extra(k) 'optional: x always adds up to the same number'

;

* counts

* done(k)=0 => sum(i, same(k,i)) = n-2

* done(k)=1 => sum(i, same(k,i)) = n

count_same(k)$(ord(k)>1).. sum(i, s(k,i)) =e= card(i)-2*(1-d(k));

* same

* same(k,i)=1 ==> x(k,i) = x(k-1,i)

same1(k,i)$(ord(k)>1)..x(k,i) =l= x(k-1,i) + M*(1-s(k,i));

same2(k,i)$(ord(k)>1)..x(k,i) =g= x(k-1,i) - M*(1-s(k,i));

* swap

* same(k,i)=0 & same(k,j)=0 ==> x(k,i) = x(k-1,j), x(k,j) = x(k-1,i)

* we will allow useless swaps

swap1(k,i,j)$(ord(i)<>ord(j) and ord(k)>1).. x(k,i) =l= x(k-1,j) + M*(s(k,i)+s(k,j));

swap2(k,i,j)$(ord(i)<>ord(j) and ord(k)>1).. x(k,i) =g= x(k-1,j) - M*(s(k,i)+s(k,j));

* done

isSorted(k,i)$(ord(i)>1 and ord(k)>1).. x(k-1,i) =g= x(k-1,i-1) - M*(1-d(k));

* optional constraints (they help performance)

order(k)$(ord(k)>1).. d(k) =g= d(k-1);

extra(k).. sum(i, x(k,i)) =e= sum(i,init(i));

* obj

calc_itcount.. itcount =e= sum(k$(ord(k)>1), 1-d(k));

model sort /all/;

*---------------------------------------------------------------

* solve

*---------------------------------------------------------------

solve sort minimizing itcount using mip;

display x.l, s.l, d.l, itcount.l;

* make a little report

parameter trace(k,*);

trace(k,i)$(d.l(k)<0.5) = x.l(k,i);

display trace;

Conclusion and extensions

We have seen that QuickSort is not very efficient in terms of the number of swaps, when we sort: BDADDDBCCC -> ABBCCCDDDD. To find the minimum number of swap operations needed, I formulated two different optimization problems: a shortest path network model and a MIP model. Both agree: we can sort this using just 5 swaps.

Some open questions:

Can we use a version of Cycle Sort (or another sort algorithm) that gives us the optimal sorting strategy for this? Warning: some versions do not use the concept of swaps but rather count "writes". Other versions may not support duplicate values [6]. I tried my example on some different versions (including AI-generated code), but was unable to achieve our optimal solution.
Can we find the initial configuration that requires the most swaps? That is like the "longest shortest distance". We should be able to find this with Dijkstra (the longest shortest distance is 6, e.g. from BCCDDDABCD -> ABBCCCDDDD). I am not immediately sure how to formulate a direct MIP model for this. Would this be a bilevel MIP?
How would we do this with a metaheuristic?
A variant would be to allow only swapping direct neighbors [9]. It is not very difficult to adapt our models for this. Obviously, we expect to need more swaps. An optimal solution for our problem can look like:

0 BDADDDBCCC
1 BDADDBDCCC
2 BDADDBCDCC
3 BDADDBCCDC
4 BDADDBCCCD
5 BDADBDCCCD
6 BDADBCDCCD
7 BADDBCDCCD
8 BADBDCDCCD
9 BADBCDDCCD
10 BADBCDCDCD
11 BADBCDCCDD
12 BADBCCDCDD
13 BABDCCDCDD
14 BABDCCCDDD
15 BABCDCCDDD
16 BABCCDCDDD
17 BABCCCDDDD
18 ABBCCCDDDD

References

Sorting algorithms that optimise for changes/swaps/moves, https://or.stackexchange.com/questions/13387/sorting-algorithms-that-optimise-for-changes-swaps-moves
Quicksort, https://en.wikipedia.org/wiki/Quicksort
DSA Quicksort, https://www.w3schools.com/dsa/dsa_algo_quicksort.php
Selection Sort, https://www.geeksforgeeks.org/dsa/selection-sort-algorithm-2/
Cycle Sort, https://www.geeksforgeeks.org/dsa/cycle-sort/
Minimum swaps to sort an array, https://www.geeksforgeeks.org/dsa/minimum-number-swaps-required-sort-array/
A network model: Pyomo vs GAMS, https://yetanothermathprogrammingconsultant.blogspot.com/2021/08/a-network-model-pyomo-vs-gams.html
networkdata.inc, https://raw.githubusercontent.com/aopt/OptimizationModels/refs/heads/main/Sorting%20and%20Swaps/networkdata.inc
Number of swaps to sort when only adjacent swapping allowed, https://www.geeksforgeeks.org/dsa/number-swaps-sort-adjacent-swapping-allowed/

Yet Another Math Programming Consultant

Monday, December 1, 2025

Sorting: minimize number of swaps

Number of swaps in Quicksort

Minimum number of swaps: shortest path network approach

A MIP Model

Conclusion and extensions

References

No comments:

Post a Comment