Sort CSV by column name

10,430

Solution 1

If you need to sort using a column name, it would be best to read your file using a Python csv.DictReader() object as follows:

import csv

with open('input.csv', 'r', newline='') as f_input:
    csv_input = csv.DictReader(f_input)
    data = sorted(csv_input, key=lambda row: (row['Total'], row['X']))

with open('output.csv', 'w', newline='') as f_output:    
    csv_output = csv.DictWriter(f_output, fieldnames=csv_input.fieldnames)
    csv_output.writeheader()
    csv_output.writerows(data)

This would then allow you to sort based on the Total column regardless of where it is. The csv.DictReader() reads each row as a dictionary, using the first row as a header. The header values can then be used as dictionary keys to access items rather than using positional values with a normal csv.reader(). So your first data row would be read as:

{'Total': '1', 'Blue': '0', 'Black': '1', 'Green': '0', 'X': 'Andy', 'Red': '0'}

csv_input can then be passed directly to sorted() to create a sorted list of row dictionaries. The key is the total field and then also the X field for the cases where the total is the same.

This would then write a sorted output.csv as follows:

X,Blue,Black,Red,Green,Total
Andy,0,1,0,0,1
Mark,1,0,0,1,2
Ronda,0,1,0,1,2
Thomas,2,0,0,0,2
Luise,0,0,2,1,3

Solution 2

Something like this,

reader = csv.reader(open('input.csv'))

header = reader.next()
sort_col_idx = header.index('Total')
sorted(reader, key=lambda row: row[sort_col_idx]) # Without header
[header] + sorted(reader, key=lambda row: row[sort_col_idx]) # With headers.

Solution 3

You can define a function to sort according to any column.

import csv
import operator

data = csv.reader(open('input.csv'))

# function to sort according to any column.
# table corresponds to data and col is argument for the row number. here 5
def sort_table(table, col=0):
     return sorted(table, key=operator.itemgetter(col))

print(sort_table(data, 5))

However pandas are better option. Try getting used to that.

Share:
10,430
Admin
Author by

Admin

Updated on June 16, 2022

Comments

  • Admin
    Admin almost 2 years

    I need to sort a CSV table by the column name. The data I get in CSV has various number of columns, but the column name is static. This is the data I have:

    X,Blue,Black,Red,Green,Total
    Thomas,2,0,0,0,2
    Andy,0,1,0,0,1
    Luise,0,0,2,1,3
    Mark,1,0,0,1,2
    Ronda,0,1,0,1,2
    

    Pretty print:

      X     Blue    Black   Red Green  Total
    Thomas   2       0       0    0     2
    Andy     0       1       0    0     1
    Luise    0       0       2    1     3
    Mark     1       0       0    1     2
    Ronda    0       1       0    1     2
    

    Normally I could just sort by row[5] but if I get a table without any Greens, or a table with additional column - Yellow, it will no longer work.

    Therefore the question is, how can I sort this CSV file by the column named Total?

    Can only use core python modules, no pandas. :(