Pandas dataframe pivot not fitting in memory

python pandas dataframe out-of-memory

10,188

Solution 1

Try to see if this fits in your memory:

df.groupby(['newidx', 'Code'])['val'].max().unstack()

pivot_table is unfortunately very memory intensive as it may make multiple copies of data.

If the groupby does not work, you will have to split your DataFrame into smaller pieces. Try not to assign multiple times. For example, if reading from csv:

df = pd.read_csv('file.csv').groupby(['newidx', 'Code'])['val'].max().unstack()

avoids multiple assignments.

Solution 2

I've had a very similar problem when carrying out a merge between 4 dataframes recently.

What worked for me was disabling the index during the groupby, then merging.

if @Kartiks answer doesn't work, try this before chunking the DataFrame.

df.groupby(['newidx', 'Code'], as_index=False)['val'].max().unstack()

10,188

Author by

Ulderique Demoitre

Updated on June 14, 2022

Comments

Ulderique Demoitre about 2 years
I have a dataframe df with the following structure:
```
        val          newidx    Code
Idx                             
0       1.0      1220121127    706
1       1.0      1220121030    706
2       1.0      1620120122    565
```
It has 1000000 lines. In total we have 600 unique Code value and 200000 unique newidx values.

If I perform the following operation
```
df.pivot_table(values='val', index='newidx', columns='Code', aggfunc='max')
```
I get a MemoryError . but this sounds strange as the size of the resulting dataframe should be sustainable: 200000x600.

How much memory requires such operation? Is there a way to fix this memory error?