Pandas NaN introduced by pivot_table
32,969
I think the best way to understand pivoting
is to apply it to a small sample:
import pandas as pd
import numpy as np
countryKPI = pd.DataFrame({'germanCName':['a','a','b','c','c'],
'indicator.id':['z','x','z','y','m'],
'value':[7,8,9,7,8]})
print (countryKPI)
germanCName indicator.id value
0 a z 7
1 a x 8
2 b z 9
3 c y 7
4 c m 8
print (pd.pivot_table(countryKPI, index=['germanCName'], columns=['indicator.id']))
value
indicator.id m x y z
germanCName
a NaN 8.0 NaN 7.0
b NaN NaN NaN 9.0
c 8.0 NaN 7.0 NaN
If need replace NaN
to 0
add parameter fill_value
:
print (countryKPI.pivot_table(index='germanCName',
columns='indicator.id',
values='value',
fill_value=0))
indicator.id m x y z
germanCName
a 0 8 0 7
b 0 0 0 9
c 8 0 7 0
Author by
Georg Heiler
I am a Ph.D. candidate at the Vienna University of Technology and Complexity Science Hub Vienna as well as a data scientist in the industry.
Updated on July 27, 2022Comments
-
Georg Heiler almost 2 years
I have a table containing some countries and their KPI from the world-banks API. this looks like . As you can see no nan values are present.
However, I need to pivot this table to bring int into the right shape for analysis. A
pd.pivot_table(countryKPI, index=['germanCName'], columns=['indicator.id'])
For some e.g.TUERKEI
this works just fine:But for most of the countries strange nan values are introduced. How can I prevent this?
-
Georg Heiler over 7 yearsIndeed this example is good. But how can I prevent the NaN values?
-
jezrael over 7 yearsOk, what do you need?
NaN
replace to0
? -
Georg Heiler over 7 yearsI see so the problem is that in my data for some countries not all indicators were reported ... :(
-
jezrael over 7 yearsYes, exactly. This is problem why you get NaN.
-
Georg Heiler over 7 yearsBut a call to api.worldbank.org/countries/eg/indicators/… indeed returns some data -> which strangely did not result in my download pd.Dataframe
-
Georg Heiler over 7 yearsLet us continue this discussion in chat.
-
bernando_vialli almost 6 yearsany idea why when I add fill_values=0, it doesn't do anything? I want it to count the 0's (missing values in my average) in the pivot table but it doesn't take the missing values into account.