Pandas groupby to to_csv
Solution 1
Try doing this:
week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')
That'll write the entire dataframe to the file. If you only want those two columns then,
week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')
Here's a line by line explanation of the original code:
# This creates a "groupby" object (not a dataframe object)
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')
# This instructs pandas to sum up all the numeric type columns in each
# group. This returns a dataframe where each row is the sum of the
# group's numeric columns. You're not storing this dataframe in your
# example.
week_grouped.sum()
# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method.
# So we should store the previous line's result (a dataframe) into a variable
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')
# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')
# Or with less typing simply
week_grouped.sum().to_csv('...')
Solution 2
Try changing your second line to week_grouped = week_grouped.sum()
and re-running all three lines.
If you run week_grouped.sum()
in its own Jupyter notebook cell, you'll see how the statement returns the output to the cell's output, instead of assigning the result back to week_grouped
. Some pandas methods have an inplace=True
argument (e.g., df.sort_values(by=col_name, inplace=True)
), but sum
does not.
EDIT: does each week number only appear once in your CSV? If so, here's a simpler solution that doesn't use groupby
:
df = pd.read_csv('input.csv')
df[['id', 'count']].to_csv('output.csv')
Solution 3
Group By returns key, value pairs where key is the identifier of the group and the value is the group itself, i.e. a subset of an original df that matched the key.
In your example week_grouped = df.groupby('week')
is set of groups (pandas.core.groupby.DataFrameGroupBy object) which you can explore in detail as follows:
for k, gr in week_grouped:
# do your stuff instead of print
print(k)
print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
print(gr)
# You can save each 'gr' in a csv as follows
gr.to_csv('{}.csv'.format(k))
Or alternatively you can compute aggregation function on your grouped object
result = week_grouped.sum()
# This will be already one row per key and its aggregation result
result.to_csv('result.csv')
In your example you need to assign the function result to some variable as by default pandas objects are immutable.
some_variable = week_grouped.sum()
some_variable.to_csv('week_grouped.csv') # This will work
basically result.csv and week_grouped.csv are meant to be same
Solution 4
I feel that there is no need to use a groupby, you can just drop the columns you do not want too.
df = df.drop(['month','year'], axis=1)
df.reset_index()
df.to_csv('Your path')
Solution 5
Pandas groupby generates a lot of information (count, mean, std, ...). If you want to save all of them in a csv file, first you need to convert it to a regular Dataframe:
import pandas as pd
...
...
MyGroupDataFrame = MyDataFrame.groupby('id')
pd.DataFrame(MyGroupDataFrame.describe()).to_csv("myTSVFile.tsv", sep='\t', encoding='utf-8')
kalmdown
Updated on September 25, 2021Comments
-
kalmdown over 2 years
Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.
Python 3.6.1, Pandas 0.20.1
groupby result looks like:
id month year count week 0 9066 82 32142 895 1 7679 84 30112 749 2 8368 126 42187 872 3 11038 102 34165 976 4 8815 117 34122 767 5 10979 163 50225 1252 6 8726 142 38159 996 7 5568 63 26143 582
Want a csv that looks like
week count 0 895 1 749 2 872 3 976 4 767 5 1252 6 996 7 582
Current code:
week_grouped = df.groupby('week') week_grouped.sum() #At this point you have the groupby result week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function.
Read SO solutions:
output groupby to csv file pandas
week_grouped.drop_duplicates().to_csv('week_grouped.csv')
Result: AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method
Python pandas - writing groupby output to file
week_grouped.reset_index().to_csv('week_grouped.csv')
Result: AttributeError: "Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method"
-
cs95 about 5 yearsIf you landed up here wanting to know how to save each individual groupby to its own CSV file, see this answer.
-
-
kalmdown over 6 yearsThanks! - Why does it work when sum() is part of the the to_csv statement but not when sum() is done on its own line?
-
kalmdown over 6 yearsIn the original data the week appears on multiple rows. In this case the groupby is being used to gather the weeks together so a count can be done per week.
-
kalmdown over 6 yearsBTW - Thanks so much for the explanation of why
sum
is an issue. -
kalmdown over 6 yearsThank you for the indepth explanation. Helps to understand the system instead of just the problem.
-
kalmdown over 6 yearsShould be "axis=1"...But yes that will output the rows but not grouped by week or state.
-
Alex Luis Arias about 5 years@kalmdown, if this answered your question, can you please mark it as so? Click the check mark to make it green.
-
Alex Luis Arias almost 4 years@kalmdown, did my reply answer your question? My answer still hasn't been marked as accepted.