KeyError in for loop of dataframe in pandas
Solution 1
I've tried the following code and it works on my PC. I use .loc
with the aim to avoid potential key error.
import pandas as pd
import numpy as np
# just following your previous post to simulate your data
np.random.seed(0)
dates = np.random.choice(pd.date_range('2015-01-01 00:00:00', '2015-06-30 00:00:00', freq='1h'), 10000)
company = np.random.choice(['company' + x for x in '1 2 3 4 5'.split()], 10000)
df = pd.DataFrame(dict(recvd_dttm=dates, CompanyName=company)).set_index('recvd_dttm').sort_index()
df['C'] = 1
df.columns = ['CompanyName', '']
result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg({df.columns[1]: sum}).reset_index()
result.columns = ['Month', 'CompanyName', 'counts']
pivot_table = result.pivot(index='CompanyName', columns='Month', values='counts')
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce",
"#ddb7b1", "#cc7878", "#933b41", "#550b1d" ]
month = []
company = []
color = []
rate = []
for y in pivot_table.index:
for m in pivot_table.columns:
month.append(m)
company.append(y)
num_calls = pivot_table.loc[y, m]
rate.append(num_calls)
color.append(colors[min(int(num_calls)-2, 8)])
Solution 2
Try changing the loop to
for m in pivot_table.columns:
It seems you can achieve the same thing without any loops though. You're looping through the row index and column index to access each entry individually and appending them to a list, so rate
is just a list of all elements in the data frame. You can achieve this by
rate= pivot_table.stack().astype(int).tolist()
color = [colours[min(x - 2, 8)] for x in rate]
Am i missing something here?
jenryb
Updated on June 05, 2022Comments
-
jenryb almost 2 years
I am putting my data into a bokeh layout of a heat map, but am getting a KeyError: '1'. It occurs right at the line
num_calls = pivot_table[m][y]
does anybody know why this would be?The pivot table I am using is below:
pivot_table.head() Out[101]: Month 1 2 3 4 5 6 7 8 9 CompanyName Company 1 182 270 278 314 180 152 110 127 129 Company 2 163 147 192 142 186 231 214 130 112 Company 3 126 88 99 139 97 97 96 37 79 Company 4 84 89 71 95 80 89 83 88 104 Company 5 91 96 94 66 81 77 87 83 68 Month 10 11 12 CompanyName Company 1 117 127 81 Company 2 117 93 101 Company 3 116 111 95 Company 4 93 78 64 Company 5 83 95 65
Below is the section of code leading up to the error:
pivot_table = pivot_table.reset_index() pivot_table['CompanyName'] = [str(x) for x in pivot_table['CompanyName']] Companies = list(pivot_table['CompanyName']) months = ["1","2","3","4","5","6","7","8","9","10","11","12"] pivot_table = pivot_table.set_index('CompanyName') # this is the colormap from the original plot colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d" ] # Set up the data for plotting. We will need to have values for every # pair of year/month names. Map the rate to a color. month = [] company = [] color = [] rate = [] for y in Companies: for m in months: month.append(m) company.append(y) num_calls = pivot_table[m][y] rate.append(num_calls) color.append(colors[min(int(num_calls)-2, 8)])
and upon request:
pivot_table.info() <class 'pandas.core.frame.DataFrame'> Index: 46 entries, Company1 to LastCompany Data columns (total 12 columns): 1.0 46 non-null float64 2.0 46 non-null float64 3.0 46 non-null float64 4.0 46 non-null float64 5.0 46 non-null float64 6.0 46 non-null float64 7.0 46 non-null float64 8.0 46 non-null float64 9.0 46 non-null float64 10.0 46 non-null float64 11.0 46 non-null float64 12.0 46 non-null float64 dtypes: float64(12) memory usage: 4.5+ KB
and
pivot_table.columns Out[103]: Index([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], dtype='object')
Also the bokeh code is here: http://docs.bokeh.org/en/latest/docs/gallery/unemployment.html