Pandas split CSV into multiple CSV's (or DataFrames) by a column

10,153

you can generate a dictionary of DataFrames:

d = {g:x for g,x in df.groupby('The_evil_column')}

In [95]: d.keys()
Out[95]: dict_keys(['something1', 'something2', 'something3'])

In [96]: d['something1']
Out[96]:
    Fruit   Color The_evil_column
0   Apple     Red      something1
1   Apple   Green      something1
2  Orange  Orange      something1

or a list of DataFrames:

In [103]: l = [x for _,x in df.groupby('The_evil_column')]

In [104]: l[0]
Out[104]:
    Fruit   Color The_evil_column
0   Apple     Red      something1
1   Apple   Green      something1
2  Orange  Orange      something1

In [105]: l[1]
Out[105]:
    Fruit  Color The_evil_column
3  Orange  Green      something2
4   Apple    Red      something2

In [106]: l[2]
Out[106]:
   Fruit Color The_evil_column
5  Apple   Red      something3

UPDATE:

In [111]: g = pd.read_csv(filename, sep=';').groupby('The_evil_column')

In [112]: g.ngroups   # number of unique values in the `The_evil_column` column
Out[112]: 3

In [113]: g.apply(lambda x: x.to_csv(r'c:\temp\{}.csv'.format(x.name)))
Out[113]:
Empty DataFrame
Columns: []
Index: []

will produce 3 files:

In [115]: glob.glob(r'c:\temp\something*.csv')
Out[115]:
['c:\\temp\\something1.csv',
 'c:\\temp\\something2.csv',
 'c:\\temp\\something3.csv']
Share:
10,153
Elias Cort Aguelo
Author by

Elias Cort Aguelo

Updated on June 17, 2022

Comments

  • Elias Cort Aguelo
    Elias Cort Aguelo almost 2 years

    I'm very lost with a problem and some help or tips will be appreciated.

    The problem: I've a csv file with a column with the possibility of multiple values like:

    Fruit;Color;The_evil_column
    Apple;Red;something1
    Apple;Green;something1
    Orange;Orange;something1
    Orange;Green;something2
    Apple;Red;something2
    Apple;Red;something3
    

    I've loaded the data into a dataframe and i need to split that dataframe into multiple dataframes based on the value of the column "The_evil_column":

    df1
    Fruit;Color;The_evil_column
    Apple;Red;something1
    Apple;Green;something1
    Orange;Orange;something1
    
    df2
    Fruit;Color;The_evil_column
    Orange;Green;something2
    Apple;Red;something2
    
    df3
    Fruit;Color;The_evil_column
    Apple;Red;something3
    

    After reading some posts i'm even more confused and i need some tip about this please.