Sorting by Specific Column data using .csv in python

python sorting csv

21,137

Solution 1

How about using pandas?

import pandas as pd
df = pd.read_csv('Full_List.csv')
df = df.sort('red label')
df.to_csv('Full_List_sorted.csv', index=False)

You may need to adjust the options to read_csv and to_csv to match the format of your CSV file.

Solution 2

groupby isn't for sorting, it's for chunking an iterable. For sorting use sorted.

import csv

reader = csv.DictReader(open('Full_List.csv', 'r'))
result = sorted(reader, key=lambda d: float(d['red label']))

writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

Note: I changed your lambda to cast your character data to float for correct numerical sorting.

Solution 3

I found with testing that the following works on csv files that I have. Note that all rows of the column have valid entries.

from optparse import OptionParser
# Create options.statistic using -s
# Open and set up input file
ifile = open(options.filein, 'rb')
reader = cvs.DictReader(ifile)
# Create the sorted list
try:
  print 'Try the float version'
  sortedlist = sorted(reader, key = lambda d: float(d[options.statistic]), reverse=options.high)
except ValueError:
  print 'Need to use the text version'
  ifile.seek(0)
  ifile.next()
  sortedlist = sorted(reader, key=lambda d: d[options.statistic], reverse=options.high)
# Close the input file. This allows the input file to be the same as the output file
ifile.close()
# Open the output file
ofile = open(options.fileout, 'wb')
writer = csv.DictWriter(ofile, fieldnames=outfields, extrasactions='ignore', restval = '')
# Output the header
writer.writerow(dict((fn, fn) for fn in outfields))
# Output the sorted list
writer.writerows(sortedlist)
ofile.close()

21,137

Author by

AzKai

I'm a 3rd year Computer Games Development student in college. I work part time as a DJ and the rest of my time is consumed by programming.

Updated on July 09, 2022

Comments

AzKai almost 2 years
I'm trying to order a .csv file with just over 300 entries and output it all back out ordered by the numerical values in one specific column under a dialect. Here's the code I've written so far but it just seems to output the data as it went in
```
import csv
import itertools
from itertools import groupby as gb

reader = csv.DictReader(open('Full_List.csv', 'r'))

groups = gb(reader, lambda d: d['red label'])
result = [max(g, key=lambda d: d['red label']) for k, g in groups]



writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)
```
There's only 50 rows in the whole file that contain a value under the dialect "red label" and all the others are left blank. It's in the Z column on the .csv(but not that last one) so I'd assume the index of the column is 25(0 being the first). Any help would be greatly appreciated.
AzKai about 11 years

I've tried that and gotten the following error: ValueError: could not convert string to float: I changed the casting from float to str. It compiled but it completely eliminated all values in the column it's sorting
Steven Rumbalski about 11 years

From the ValueError it appears that d['red label'] does not always return numeric data. Do you have any empty fields? As regards to "it completely eliminated all values in the column", I think that is not the case. This code does not overwrite any values. It would be helpful to see your actual data.
AzKai about 11 years

Yes. All but 50 of the entries within that column are just blank fields.
Steven Rumbalski about 11 years

If those blank fields can be sorted as if they have a value of 0.0 change float(d['red label']) to float(d['red label']) if d['red label']) else 0.0.
Steven Rumbalski about 11 years

@AzKai: Post the first ten lines of your file. Something is not quite right here.
AzKai about 11 years

These are the first 2 lines(not enough comment space for all 10: ID tag number genus species locality location name new register number x or series collector's name collection number no. cv donor donor's ref. number infrasp. rank epithet; infraspecific common name German name French name Irish name COUNTRY OF ORIGIN date of collection collection information accession status planting date red label X notes References synonym date of verification verifier's name Champion tree dimensions authority family 6994 4009 Abelia sp. BY Bothy 9 check identity. Caprifoliaceae
AzKai about 11 years

Or if needs be I can send you the file to have a look at it
AzKai about 11 years

I've tried using the pandas method you have told me about but whenever I run the script I get the error "No module pandas exists" even though I've installed it from my python directory using sudo apt-get install python-pandas
Adam Obeng about 11 years

Which version of python and what operating system are you using?
AzKai about 11 years

I'm using python 3.2 on Ubuntu 12.10
AzKai about 11 years

Edit: I've figured out what the problem is in trying to run pandas. When I installed it it installed into my python2.7 folder but when I run my script it's running from the python3.2 folder which is in the same directory as the 2.7 version which is /usr/local/lib and I've no idea how to change my script to run from that directory
AzKai about 11 years

Finally got around the pandas error but the output is still the same as the above method that Steven gave me
Adam Obeng about 11 years

By the same output, do you mean a ValueError?
AzKai about 11 years

No, I mean there is no change in the output. The file that's outputted is the exact same as the one that went in
Adam Obeng about 11 years

Are the fields in the file not separated by commas?
AzKai about 11 years

I eventually got it fixed. Thank you very much for all the help Adam :)
Rick Westera about 2 years

sort appears to have been deprecated since this answer was published. Using sort_values instead works.