how to sort dataframe based on particular (string)columns using python pandas?

31,321

You can first extract digits and cast to int by astype. Then sort_values of column sort and last drop this column:

df['sort'] = df['product'].str.extract('(\d+)', expand=False).astype(int)
df.sort_values('sort',inplace=True, ascending=False)
df = df.drop('sort', axis=1)
print (df)
  product  values
2     a10      15
5      a6      67
1      a5      20
4      a3      12
3      a2      45
0      a1      10

It is necessary, because if use only sort_values:

df.sort_values('product',inplace=True, ascending=False)
print (df)
  product  values
5      a6      67
1      a5      20
4      a3      12
3      a2      45
2     a10      15
0      a1      10

Another idea is use natsort library:

from natsort import index_natsorted, order_by_index

df = df.reindex(index=order_by_index(df.index, index_natsorted(df['product'], reverse=True)))
print (df)
  product  values
2     a10      15
5      a6      67
1      a5      20
4      a3      12
3      a2      45
0      a1      10
Share:
31,321
Sai Rajesh
Author by

Sai Rajesh

Updated on July 09, 2022

Comments

  • Sai Rajesh
    Sai Rajesh almost 2 years

    My Pandas data frame contains the following data:

    product,values
     a1,     10
     a5,     20
     a10,    15
     a2,     45
     a3,     12
     a6,     67
    

    I have to sort this data frame based on the product column. Thus, I would like to get the following output:

    product,values
     a10,     15
     a6,      67
     a5,      20
     a3,      12
     a2,      45
     a1,      10
    

    Unfortunately, I'm facing the following error:

    ErrorDuringImport(path, sys.exc_info())

    ErrorDuringImport: problem in views - type 'exceptions.Indentation

  • Sai Rajesh
    Sai Rajesh almost 8 years
    i m using python 2.7 version
  • jezrael
    jezrael almost 8 years
    I think copy text of error, under tags in question give edit and paste text under text of question. Thanks.
  • Sai Rajesh
    Sai Rajesh almost 8 years
    k now i will add that
  • jezrael
    jezrael almost 8 years
    It looks like your pandas is broken - see link.
  • Sai Rajesh
    Sai Rajesh almost 8 years
    what i have do now?
  • jezrael
    jezrael almost 8 years
    Do you use anaconda?
  • Sai Rajesh
    Sai Rajesh almost 8 years
    and python 2.7 version
  • Sai Rajesh
    Sai Rajesh almost 8 years
    give me one solution
  • jezrael
    jezrael almost 8 years
    maybe help try update pandas - if use windows - in command line use conda update pandas
  • Sai Rajesh
    Sai Rajesh almost 8 years
    i m not able to do that..because that anoconda inbuilt in our product
  • jezrael
    jezrael almost 8 years
    phoooo, so it is bigger problem. One idea - maybe if use lower version of pandas, try remove , expand=False from first row of code. What is version of pandas? print pd.versions()
  • Sai Rajesh
    Sai Rajesh almost 8 years
    hm k...i will try that
  • jezrael
    jezrael almost 8 years
    Super Ia am glad it working. Thank you for accepting. You can upvote too.
  • Sai Rajesh
    Sai Rajesh almost 8 years
    how to add d in all rows?
  • Sai Rajesh
    Sai Rajesh almost 8 years
    i mean my column contains 1,2,3,4,5 but i have to change d1,d2,d3,d4,d5
  • jezrael
    jezrael almost 8 years
    use list comprehension - df.columns = ['d' + str(col) for col in df.columns]