Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()

64,135

Solution 1

change

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

to

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(total='count')).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(Avg='mean')).reset_index()['Avg']

reason: in new pandas version named aggregation is the recommended replacement for the deprecated “dict-of-dicts” approach to naming the output of column-specific aggregations (Deprecate groupby.agg() with a dictionary when renaming).

source: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

Solution 2

This error also happens if a column specified in the aggregation function dict does not exist in the dataframe:

In [190]: group = pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A')
In [195]: group.agg({'B': 'mean'})
Out[195]: 
   B
A   
1  2

In [196]: group.agg({'B': 'mean', 'non-existing-column': 'mean'})
...
SpecificationError: nested renamer is not supported

Solution 3

I found the way: Instead of going like

g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{"maxQ":np.max,"minQ":np.min,"meanQ":np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']

Do as follows:

g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{np.max,np.min,np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']

I had the same error and this is how I resolved it!

Solution 4

Do you get the same error if you change

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

to

temp['total'] = project_data.groupby(col1)[col2].agg(total=('total','count')).reset_index()['total']

Solution 5

Not a very elegant solution but this one works. As renaming the column is deprecated with the way you are doing. But there is work around. Create a temporary variable 'approved' , store the col2 in it. Because when you apply agg function , the original column values will change with column name. You can preserve the column name but then values in those column will change. So in order to preserve the original dataframe and to have two new columns with desired names, you can use the following code.

approved = temp[col2]
temp = pd.DataFrame(project_data.groupby(col1)[col2].agg([('Avg','mean'),('total','count')]).reset_index())
temp[col2] = approved

P.S : Seems like an assignment of AAIC, I am working on same :)

Share:
64,135
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years
    def stack_plot(data, xtick, col2='project_is_approved', col3='total'):
        ind = np.arange(data.shape[0])
    
        plt.figure(figsize=(20,5))
        p1 = plt.bar(ind, data[col3].values)
        p2 = plt.bar(ind, data[col2].values)
    
        plt.ylabel('Projects')
        plt.title('Number of projects aproved vs rejected')
        plt.xticks(ind, list(data[xtick].values))
        plt.legend((p1[0], p2[0]), ('total', 'accepted'))
        plt.show()
    
    def univariate_barplots(data, col1, col2='project_is_approved', top=False):
        # Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039
        temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index()
    
        # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
        temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
    
        temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']
    
        temp.sort_values(by=['total'],inplace=True, ascending=False)
    
        if top:
            temp = temp[0:top]
    
        stack_plot(temp, xtick=col1, col2=col2, col3='total')
        print(temp.head(5))
        print("="*50)
        print(temp.tail(5))
    
    univariate_barplots(project_data, 'school_state', 'project_is_approved', False)
    

    Error:

    SpecificationError                        Traceback (most recent call last)
    <ipython-input-21-2cace8f16608> in <module>()
    ----> 1 univariate_barplots(project_data, 'school_state', 'project_is_approved', False)
    
    <ipython-input-20-856fcc83737b> in univariate_barplots(data, col1, col2, top)
          4 
          5     # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
    ----> 6     temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
          7     print (temp['total'].head(2))
          8     temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']
    
    ~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
        251             # but not the class list / tuple itself.
        252             func = _maybe_mangle_lambdas(func)
    --> 253             ret = self._aggregate_multiple_funcs(func)
        254             if relabeling:
        255                 ret.columns = columns
    
    ~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in _aggregate_multiple_funcs(self, arg)
        292             # GH 15931
        293             if isinstance(self._selected_obj, Series):
    --> 294                 raise SpecificationError("nested renamer is not supported")
        295 
        296             columns = list(arg.keys())
    
    SpecificationError: **nested renamer is not supported**
    
  • Mark Andersen
    Mark Andersen almost 4 years
    This answer points to the actual source of the error. The other answer indicating there is another way to specify may be true but does not get to the root cause.
  • JoAnn Alvarez
    JoAnn Alvarez over 3 years
    This solution has the benefit of having the resulting columns named appropriately.
  • Areza
    Areza about 3 years
    how would you put multiple function inside the aggregate ? for example adding min and max too
  • jkr
    jkr about 2 years
    add them as keyword arguments, like .agg(avg="mean", total="count")