Sorting by Specific Column data using .csv in python

21,137

Solution 1

How about using pandas?

import pandas as pd
df = pd.read_csv('Full_List.csv')
df = df.sort('red label')
df.to_csv('Full_List_sorted.csv', index=False)

You may need to adjust the options to read_csv and to_csv to match the format of your CSV file.

Solution 2

groupby isn't for sorting, it's for chunking an iterable. For sorting use sorted.

import csv

reader = csv.DictReader(open('Full_List.csv', 'r'))
result = sorted(reader, key=lambda d: float(d['red label']))

writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

Note: I changed your lambda to cast your character data to float for correct numerical sorting.

Solution 3

I found with testing that the following works on csv files that I have. Note that all rows of the column have valid entries.

from optparse import OptionParser
# Create options.statistic using -s
# Open and set up input file
ifile = open(options.filein, 'rb')
reader = cvs.DictReader(ifile)
# Create the sorted list
try:
  print 'Try the float version'
  sortedlist = sorted(reader, key = lambda d: float(d[options.statistic]), reverse=options.high)
except ValueError:
  print 'Need to use the text version'
  ifile.seek(0)
  ifile.next()
  sortedlist = sorted(reader, key=lambda d: d[options.statistic], reverse=options.high)
# Close the input file. This allows the input file to be the same as the output file
ifile.close()
# Open the output file
ofile = open(options.fileout, 'wb')
writer = csv.DictWriter(ofile, fieldnames=outfields, extrasactions='ignore', restval = '')
# Output the header
writer.writerow(dict((fn, fn) for fn in outfields))
# Output the sorted list
writer.writerows(sortedlist)
ofile.close()
Share:
21,137
AzKai
Author by

AzKai

I'm a 3rd year Computer Games Development student in college. I work part time as a DJ and the rest of my time is consumed by programming.

Updated on July 09, 2022

Comments

  • AzKai
    AzKai almost 2 years

    I'm trying to order a .csv file with just over 300 entries and output it all back out ordered by the numerical values in one specific column under a dialect. Here's the code I've written so far but it just seems to output the data as it went in

    import csv
    import itertools
    from itertools import groupby as gb
    
    reader = csv.DictReader(open('Full_List.csv', 'r'))
    
    groups = gb(reader, lambda d: d['red label'])
    result = [max(g, key=lambda d: d['red label']) for k, g in groups]
    
    
    
    writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
    writer.writeheader()
    writer.writerows(result)
    

    There's only 50 rows in the whole file that contain a value under the dialect "red label" and all the others are left blank. It's in the Z column on the .csv(but not that last one) so I'd assume the index of the column is 25(0 being the first). Any help would be greatly appreciated.

  • AzKai
    AzKai about 11 years
    I've tried that and gotten the following error: ValueError: could not convert string to float: I changed the casting from float to str. It compiled but it completely eliminated all values in the column it's sorting
  • Steven Rumbalski
    Steven Rumbalski about 11 years
    From the ValueError it appears that d['red label'] does not always return numeric data. Do you have any empty fields? As regards to "it completely eliminated all values in the column", I think that is not the case. This code does not overwrite any values. It would be helpful to see your actual data.
  • AzKai
    AzKai about 11 years
    Yes. All but 50 of the entries within that column are just blank fields.
  • Steven Rumbalski
    Steven Rumbalski about 11 years
    If those blank fields can be sorted as if they have a value of 0.0 change float(d['red label']) to float(d['red label']) if d['red label']) else 0.0.
  • Steven Rumbalski
    Steven Rumbalski about 11 years
    @AzKai: Post the first ten lines of your file. Something is not quite right here.
  • AzKai
    AzKai about 11 years
    These are the first 2 lines(not enough comment space for all 10: ID tag number genus species locality location name new register number x or series collector's name collection number no. cv donor donor's ref. number infrasp. rank epithet; infraspecific common name German name French name Irish name COUNTRY OF ORIGIN date of collection collection information accession status planting date red label X notes References synonym date of verification verifier's name Champion tree dimensions authority family 6994 4009 Abelia sp. BY Bothy 9 check identity. Caprifoliaceae
  • AzKai
    AzKai about 11 years
    Or if needs be I can send you the file to have a look at it
  • AzKai
    AzKai about 11 years
    I've tried using the pandas method you have told me about but whenever I run the script I get the error "No module pandas exists" even though I've installed it from my python directory using sudo apt-get install python-pandas
  • Adam Obeng
    Adam Obeng about 11 years
    Which version of python and what operating system are you using?
  • AzKai
    AzKai about 11 years
    I'm using python 3.2 on Ubuntu 12.10
  • AzKai
    AzKai about 11 years
    Edit: I've figured out what the problem is in trying to run pandas. When I installed it it installed into my python2.7 folder but when I run my script it's running from the python3.2 folder which is in the same directory as the 2.7 version which is /usr/local/lib and I've no idea how to change my script to run from that directory
  • AzKai
    AzKai about 11 years
    Finally got around the pandas error but the output is still the same as the above method that Steven gave me
  • Adam Obeng
    Adam Obeng about 11 years
    By the same output, do you mean a ValueError?
  • AzKai
    AzKai about 11 years
    No, I mean there is no change in the output. The file that's outputted is the exact same as the one that went in
  • Adam Obeng
    Adam Obeng about 11 years
    Are the fields in the file not separated by commas?
  • AzKai
    AzKai about 11 years
    I eventually got it fixed. Thank you very much for all the help Adam :)
  • Rick Westera
    Rick Westera about 2 years
    sort appears to have been deprecated since this answer was published. Using sort_values instead works.