Join two DataFrames on one key column / ERROR: 'columns overlap but no suffix specified'

25,477

I think you want to do a merge rather than a join:

sales.merge(part_table)

Here's an example dataframe:

In [11]: dfa = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [12]: dfb = pd.DataFrame([[1, 'a'], [3, 'b'], [3, 'c']], columns=['A', 'C'])

In [13]: dfa.join(dfb, on=['A'])
ValueError: columns overlap but no suffix specified: Index([u'A'], dtype='object')

In [14]: dfa.merge(dfb)
Out[14]:
   A  B  C
0  1  2  a
1  3  4  b
2  3  4  c

It's unclear from the docs if this is intentational (I thought that on would be used as the column) but following the exceptions message if you add suffixs we can see what's going on:

In [21]: dfb.join(dfa, on=['A'], lsuffix='_a', rsuffix='_b')
Out[21]:
   A_a  C  A_b   B
0    1  a    3   4
1    3  b  NaN NaN
2    3  c  NaN NaN

In [22]: dfb.join(dfa, lsuffix='_a', rsuffix='_b')
Out[22]:
   A_a  C  A_b   B
0    1  a    1   2
1    3  b    3   4
2    3  c  NaN NaN

It's ignoring the on kwarg and just doing the join.

Share:
25,477
Yumi
Author by

Yumi

Updated on September 25, 2020

Comments

  • Yumi
    Yumi over 3 years

    I have two tables: sales table & product table and these two tables share the 'PART NUMBER' column. The 'PART NUMBER' column in the sales table is not unique, but it is unique in the product table. (see image below of a snapshot of the sales table & product table)

    enter image description here

    enter image description here

    I was trying to add the equivalent 'Description' to each 'PART NUMBER' on the sales table, and I followed the examples from the pandas website my code

    sales.join(part_table, on='PART NUMBER')
    

    But I got this error:

    ValueError: columns overlap but no suffix specified: Index([u'PART NUMBER'], dtype='object')
    

    Can someone explain what this error means and how to solve it?

    Many thanks!