How can I check if a Pandas dataframe's index is sorted

15,635

Solution 1

How about:

df.index.is_monotonic

Solution 2

If sort is all allowed, try

all(df.sort_index().index == df.index)

If not, try

all(a <= b for a, b in zip(df.index, df.index[1:]))

The first one is more readable while the second one has smaller time complexity.

EDIT

Add another method I've just found. Similar with the second one but the comparison is vetorized

all(df.index[:-1] <= df.index[1:]) 

Solution 3

For non-indices:

df.equals(df.sort())

Solution 4

Just for the sake of completeness, this would be the procedure to check whether the dataframe index is monotonic increasing and also unique, and, if not, make it to be:

if not (df.index.is_monotonic_increasing and df.index.is_unique):
  df.reset_index(inplace=True, drop=True)

NOTE df.index.is_monotonic_increasing is returning True even if there are repeated indices, so it has to be complemented with df.index.is_unique.

API References

Share:
15,635
Pablojim
Author by

Pablojim

Updated on June 06, 2022

Comments

  • Pablojim
    Pablojim almost 2 years

    I have a vanilla pandas dataframe with an index. I need to check if the index is sorted. Preferably without sorting it again.

    e.g. I can test an index to see if it is unique by index.is_unique() is there a similar way for testing sorted?

  • Wes McKinney
    Wes McKinney almost 11 years
    Strongly recommend using is_monotonic
  • Tim Diels
    Tim Diels over 7 years
    Use is_monotonic_increasing to check for ascending order and is_monotonic_decreasing to check for descending order. is_monotonic has been deprecated; it's a misnomer as it only checks for increasing monotonicity.
  • Joseph Garvin
    Joseph Garvin about 7 years
    Does it do a test on the fly, or is it just telling you that at some earlier point you somehow promised pandas that it would be monotonic?
  • Mithril
    Mithril over 5 years
    Is it better to check is_monotonic before sort_index ? Or sort_index would check it automaticlly .
  • Asclepius
    Asclepius over 3 years
    @timdiels It is incorrect to say that is_monotonic is deprecated. It is not. Where does it say that it is?
  • Tim Diels
    Tim Diels over 3 years
    @Acumenus It was deprecated at the time I posted that comment, but seems it no longer is. Still, I wouldn't use it as it only checks whether it is monotonically increasing while a monotonically decreasing function should also be considered monotonic. To be clear, when is_monotonic_decreasing, is_monotic may be False; surprise!
  • Ahmed Fasih
    Ahmed Fasih over 2 years
    Do note, is_monotonic_increasing and is_monotonic_decreasing work for non-indexes also.
  • Eli S
    Eli S over 2 years
    I don't think this is what reset_index does. It will change the index to the default, which for many or all cases is just the row index. I think as others pointed out df.sort_index() is the right tool. It can be done inplace if desired.
  • Manu Na Eira
    Manu Na Eira over 2 years
    well @Eli, the difference lies on whether the index.is_unique or not. If the dataframe index has repeated values, df.sort_index() won't get rid of them. So that, I think my answer above is still valid, and in the case you want to get a dataframe index monotonic increasing and unique (the desired index format in most cases, MultiIndexes aside), df.reset_index is a quick way to get it.
  • Eli S
    Eli S about 2 years
    This may work if you start with a default (integer index) but it will not work with, say, df = pd.DataFrame(index=[pd.Timestamp(2000,1,1),pd.Timestamp(2000‌​,1,2),pd.Timestamp(2‌​000,1,2),pd.Timestam‌​p(2000,1,3)],data=[0‌​,1,2,3]). reset_index will keep all four values and replace the timestamps with integer indices [0,1,2,3]. There would be few cases where this is the desired behavior, no?
  • Manu Na Eira
    Manu Na Eira about 2 years
    Yes, watch out that reset_index will modify your indices to be increaing integers! I think having the dataframe index monotonic increasing and unique is good, as it makes the dataframe data (rows) uniquely identifiable by the index itself. In the example you provide @Eli, I would rather move the timestamps to a data column and create a new index with df.reset_index(inplace=True).