Sorting by Specific Column data using .csv in python
Solution 1
How about using pandas?
import pandas as pd
df = pd.read_csv('Full_List.csv')
df = df.sort('red label')
df.to_csv('Full_List_sorted.csv', index=False)
You may need to adjust the options to read_csv
and to_csv
to match the format of your CSV file.
Solution 2
groupby
isn't for sorting, it's for chunking an iterable. For sorting use sorted
.
import csv
reader = csv.DictReader(open('Full_List.csv', 'r'))
result = sorted(reader, key=lambda d: float(d['red label']))
writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)
Note: I changed your lambda to cast your character data to float for correct numerical sorting.
Solution 3
I found with testing that the following works on csv files that I have. Note that all rows of the column have valid entries.
from optparse import OptionParser
# Create options.statistic using -s
# Open and set up input file
ifile = open(options.filein, 'rb')
reader = cvs.DictReader(ifile)
# Create the sorted list
try:
print 'Try the float version'
sortedlist = sorted(reader, key = lambda d: float(d[options.statistic]), reverse=options.high)
except ValueError:
print 'Need to use the text version'
ifile.seek(0)
ifile.next()
sortedlist = sorted(reader, key=lambda d: d[options.statistic], reverse=options.high)
# Close the input file. This allows the input file to be the same as the output file
ifile.close()
# Open the output file
ofile = open(options.fileout, 'wb')
writer = csv.DictWriter(ofile, fieldnames=outfields, extrasactions='ignore', restval = '')
# Output the header
writer.writerow(dict((fn, fn) for fn in outfields))
# Output the sorted list
writer.writerows(sortedlist)
ofile.close()
AzKai
I'm a 3rd year Computer Games Development student in college. I work part time as a DJ and the rest of my time is consumed by programming.
Updated on July 09, 2022Comments
-
AzKai almost 2 years
I'm trying to order a .csv file with just over 300 entries and output it all back out ordered by the numerical values in one specific column under a dialect. Here's the code I've written so far but it just seems to output the data as it went in
import csv import itertools from itertools import groupby as gb reader = csv.DictReader(open('Full_List.csv', 'r')) groups = gb(reader, lambda d: d['red label']) result = [max(g, key=lambda d: d['red label']) for k, g in groups] writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames) writer.writeheader() writer.writerows(result)
There's only 50 rows in the whole file that contain a value under the dialect "red label" and all the others are left blank. It's in the Z column on the .csv(but not that last one) so I'd assume the index of the column is 25(0 being the first). Any help would be greatly appreciated.
-
AzKai about 11 yearsI've tried that and gotten the following error: ValueError: could not convert string to float: I changed the casting from float to str. It compiled but it completely eliminated all values in the column it's sorting
-
Steven Rumbalski about 11 yearsFrom the
ValueError
it appears thatd['red label']
does not always return numeric data. Do you have any empty fields? As regards to "it completely eliminated all values in the column", I think that is not the case. This code does not overwrite any values. It would be helpful to see your actual data. -
AzKai about 11 yearsYes. All but 50 of the entries within that column are just blank fields.
-
Steven Rumbalski about 11 yearsIf those blank fields can be sorted as if they have a value of
0.0
changefloat(d['red label'])
tofloat(d['red label']) if d['red label']) else 0.0
. -
Steven Rumbalski about 11 years@AzKai: Post the first ten lines of your file. Something is not quite right here.
-
AzKai about 11 yearsThese are the first 2 lines(not enough comment space for all 10: ID tag number genus species locality location name new register number x or series collector's name collection number no. cv donor donor's ref. number infrasp. rank epithet; infraspecific common name German name French name Irish name COUNTRY OF ORIGIN date of collection collection information accession status planting date red label X notes References synonym date of verification verifier's name Champion tree dimensions authority family 6994 4009 Abelia sp. BY Bothy 9 check identity. Caprifoliaceae
-
AzKai about 11 yearsOr if needs be I can send you the file to have a look at it
-
AzKai about 11 yearsI've tried using the pandas method you have told me about but whenever I run the script I get the error "No module pandas exists" even though I've installed it from my python directory using sudo apt-get install python-pandas
-
Adam Obeng about 11 yearsWhich version of python and what operating system are you using?
-
AzKai about 11 yearsI'm using python 3.2 on Ubuntu 12.10
-
AzKai about 11 yearsEdit: I've figured out what the problem is in trying to run pandas. When I installed it it installed into my python2.7 folder but when I run my script it's running from the python3.2 folder which is in the same directory as the 2.7 version which is /usr/local/lib and I've no idea how to change my script to run from that directory
-
AzKai about 11 yearsFinally got around the pandas error but the output is still the same as the above method that Steven gave me
-
Adam Obeng about 11 yearsBy the same output, do you mean a ValueError?
-
AzKai about 11 yearsNo, I mean there is no change in the output. The file that's outputted is the exact same as the one that went in
-
Adam Obeng about 11 yearsAre the fields in the file not separated by commas?
-
AzKai about 11 yearsI eventually got it fixed. Thank you very much for all the help Adam :)
-
Rick Westera about 2 years
sort
appears to have been deprecated since this answer was published. Usingsort_values
instead works.