Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()
Solution 1
change
temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']
to
temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(total='count')).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(Avg='mean')).reset_index()['Avg']
reason: in new pandas version named aggregation is the recommended replacement for the deprecated “dict-of-dicts” approach to naming the output of column-specific aggregations (Deprecate groupby.agg() with a dictionary when renaming).
source: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html
Solution 2
This error also happens if a column specified in the aggregation function dict does not exist in the dataframe:
In [190]: group = pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A')
In [195]: group.agg({'B': 'mean'})
Out[195]:
B
A
1 2
In [196]: group.agg({'B': 'mean', 'non-existing-column': 'mean'})
...
SpecificationError: nested renamer is not supported
Solution 3
I found the way: Instead of going like
g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{"maxQ":np.max,"minQ":np.min,"meanQ":np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']
Do as follows:
g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{np.max,np.min,np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']
I had the same error and this is how I resolved it!
Solution 4
Do you get the same error if you change
temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
to
temp['total'] = project_data.groupby(col1)[col2].agg(total=('total','count')).reset_index()['total']
Solution 5
Not a very elegant solution but this one works. As renaming the column is deprecated with the way you are doing. But there is work around. Create a temporary variable 'approved' , store the col2 in it. Because when you apply agg function , the original column values will change with column name. You can preserve the column name but then values in those column will change. So in order to preserve the original dataframe and to have two new columns with desired names, you can use the following code.
approved = temp[col2]
temp = pd.DataFrame(project_data.groupby(col1)[col2].agg([('Avg','mean'),('total','count')]).reset_index())
temp[col2] = approved
P.S : Seems like an assignment of AAIC, I am working on same :)
Admin
Updated on July 09, 2022Comments
-
Admin almost 2 years
def stack_plot(data, xtick, col2='project_is_approved', col3='total'): ind = np.arange(data.shape[0]) plt.figure(figsize=(20,5)) p1 = plt.bar(ind, data[col3].values) p2 = plt.bar(ind, data[col2].values) plt.ylabel('Projects') plt.title('Number of projects aproved vs rejected') plt.xticks(ind, list(data[xtick].values)) plt.legend((p1[0], p2[0]), ('total', 'accepted')) plt.show() def univariate_barplots(data, col1, col2='project_is_approved', top=False): # Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039 temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index() # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039 temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total'] temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg'] temp.sort_values(by=['total'],inplace=True, ascending=False) if top: temp = temp[0:top] stack_plot(temp, xtick=col1, col2=col2, col3='total') print(temp.head(5)) print("="*50) print(temp.tail(5)) univariate_barplots(project_data, 'school_state', 'project_is_approved', False)
Error:
SpecificationError Traceback (most recent call last) <ipython-input-21-2cace8f16608> in <module>() ----> 1 univariate_barplots(project_data, 'school_state', 'project_is_approved', False) <ipython-input-20-856fcc83737b> in univariate_barplots(data, col1, col2, top) 4 5 # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039 ----> 6 temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total'] 7 print (temp['total'].head(2)) 8 temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg'] ~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs) 251 # but not the class list / tuple itself. 252 func = _maybe_mangle_lambdas(func) --> 253 ret = self._aggregate_multiple_funcs(func) 254 if relabeling: 255 ret.columns = columns ~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in _aggregate_multiple_funcs(self, arg) 292 # GH 15931 293 if isinstance(self._selected_obj, Series): --> 294 raise SpecificationError("nested renamer is not supported") 295 296 columns = list(arg.keys()) SpecificationError: **nested renamer is not supported**
-
Mark Andersen almost 4 yearsThis answer points to the actual source of the error. The other answer indicating there is another way to specify may be true but does not get to the root cause.
-
JoAnn Alvarez over 3 yearsThis solution has the benefit of having the resulting columns named appropriately.
-
Areza about 3 yearshow would you put multiple function inside the aggregate ? for example adding min and max too
-
jkr about 2 yearsadd them as keyword arguments, like
.agg(avg="mean", total="count")