Pandas Plotting from Pivot Table

python python-3.x pandas matplotlib pivot-table

27,984

Solution 1

It's because line and bar plots do not set the xlim the same way. The x-axis is interpreted as categorical data in case of the bar plot, whereas it is interpreted as continuous data for the line plot. The result being that xlim and xticks are not set identically in both situations.

Consider this:

In [4]: temp.plot(kind="line",color="r",)
Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x117f555d0>
In [5]: plt.xticks()
Out[5]: (array([ 1.,  2.,  3.,  4.,  5.,  6.]), <a list of 6 Text xticklabel objects>)

where the position of the ticks is an array of float ranging from 1 to 6.

and

In [6]: rain.plot(kind="bar").legend()
Out[6]: <matplotlib.legend.Legend at 0x11c15e950>
In [7]: plt.xticks()
Out[7]: (array([0, 1, 2, 3, 4, 5]), <a list of 6 Text xticklabel objects>)

where the position of the ticks is an array of int ranging from 0 to 5.

So, the easier is to replace this part:

temp.plot(kind="line", color="r",).legend()
rain.plot(kind="bar").legend()

by:

rain.plot(kind="bar").legend()
plt.plot(range(len(temp)), temp, "r", label=temp.name)
plt.legend()

Solution 2

Thanks to jeanrjc's answer and this thread I think I'm finally quite satisfied!

for location in locations:
#print(pivot.xs(location, level=0))

split=pivot.xs(location)
rain=split["rain(mm)"]
temp=split["temp(dC)"]

fig = plt.figure()
ax1 = rain.plot(kind="bar")
ax2 = ax1.twinx()
ax2.plot(ax1.get_xticks(),temp,linestyle='-',color="r")
ax2.set_ylim((-5, 50.))
#ax1.set_ylim((0, 300.))
ax1.set_ylabel('Precipitation (mm)', color='blue')
ax2.set_ylabel('Temperature (°C)', color='red')
ax1.set_xlabel('Months')
plt.title(location)
labels = ['Jan','Feb','Mar','Apr','May','Jun', 'Jul','Aug','Sep','Oct','Nov','Dez']
#plt.xticks(range(12),labels,rotation=45)
ax1.set_xticklabels(labels, rotation=45)

I am receiving the following output, which is very close to what I intend:

27,984

Author by

cir

Updated on August 16, 2020

Comments

cir over 3 years

I am basically trying to reproduce climate diagrams showing mean temperature and precipitation over the year for various locations.

I've generated a pivot table from my csv the following way:

data = pd.read_csv("05_temp_rain_v2.csv")
pivot = data.pivot_table(["rain(mm)","temp(dC)"], ["loc","month"])

sample data in text form:

loc,lat,long,year,month,rain(mm),temp(dC)
Adria_-_Bellombra,45.011129,12.034126,1994,1,45.6,4.6  
Adria_-_Bellombra,45.011129,12.034126,1994,2,31.4,4  
Adria_-_Bellombra,45.011129,12.034126,1994,3,1.6,10.7  
Adria_-_Bellombra,45.011129,12.034126,1994,4,74.4,11.5  
Adria_-_Bellombra,45.011129,12.034126,1994,5,26,17.2  
Adria_-_Bellombra,45.011129,12.034126,1994,6,108.6,20.6

Pivot Table:

Since I am handling various locations, I am iterating over them:

locations=pivot.index.get_level_values(0).unique()

for location in locations:
    split=pivot.xs(location)

    rain=split["rain(mm)"]
    temp=split["temp(dC)"]

    plt.subplots()
    temp.plot(kind="line",color="r",).legend()
    rain.plot(kind="bar").legend()

An example plot output is shown below:

Why are my temperature values being plotted starting from February (2)?
I assume it is because the temperature values are listed in the second column.

What would be the proper way to handle and plot different data (two columns) from a pivot table?