Monday, May 16, 2022

Ranking using numpy.argsort

I needed to find a ranking of a large data set. Using Python, it makes sense to look at the numpy library for this. 

Numpy has the function argsort, which returns index positions [1]. One would think these are exactly the ranks we are after. Unfortunately, this is not the case.

>>> import numpy as np
>>> a = [3.0, 1.0, 5.0, 2.0]
>>> indx = np.argsort(a)
>>> indx
array([1, 3, 0, 2], dtype=int64)

I would expect:
     indx = [2, 0, 3, 1]

Actually, the reported indices are for a reversed mapping: from the (unknown) sorted vector to the original unsorted vector.

On the left is what I was after, and on the right is what argsort returns.

It is not very difficult to get the inverse mapping. Here are a few ways to do this:
  1. Using loops. Use the simple fact that for an index \(p\) and its inverse mapping \(p'\), we have: \[p_i=j \iff p'_j=i\]
    >>> rank = np.empty_like(indx)
    >>> for i,j in enumerate(indx):
    ...     rank[j]=i
    >>> rank
    array([2, 0, 3, 1], dtype=int64)
    Looping is in general quite slow in Python. So we may want to look into some alternatives.
  2. Fancy indexing[2]. Here we use the array indx to permute the values [0,1,2,3].
    >>> rank = np.empty_like(indx)
    >>> rank[indx] = np.arange(len(indx))
    >>> rank
    array([2, 0, 3, 1], dtype=int64)
  3. Applying argsort twice[2]. This is a bit of a surprise, but it does exactly what we want.
    >>> rank=np.argsort(indx)
    >>> rank
    array([2, 0, 3, 1], dtype=int64)
    This one is the most intriguing of course: argsort(argsort(a)) gives the ranking.

An alternative is to use scipy.stats.rankdata. The above ranking can be replicated with:

>>> import scipy.stats as stats
>>> rank = stats.rankdata(a,method='ordinal')-1
>>> rank
array([2, 0, 3, 1], dtype=int64)


1 comment:

  1. Hi Erwin,
    Could you please let me know your idea about this ?