How to do "(df1 & not df2)" dataframe merge in pandas?
I just upgraded to version 0.17.0 RC1 which was released 10 days ago. Just found out that pd.merge() have new argument in this new release called indicator=True to acheive this in pandonic way!!
df=pd.merge(df1,df2,on=['x','y'],how="outer",indicator=True)
df=df[df['_merge']=='left_only']
indicator: Add a column to the output DataFrame called _merge with information on the source of each row. _merge is Categorical-type and takes on a value of left_only for observations whose merge key only appears in 'left' DataFrame, right_only for observations whose merge key only appears in 'right' DataFrame, and both if the observation’s merge key is found in both.
Comments
-
GeorgeOfTheRF almost 2 years
I have 2 pandas dataframes df1 & df2 with common columns/keys (x,y).
I want to merge do a "(df1 & not df2)" kind of merge on keys (x,y), meaning I want my code to return a dataframe containing rows with (x,y) only in df1 & not in df2.
SAS has an equivalent functionality
data final; merge df1(in=a) df2(in=b); by x y; if a & not b; run;
Who to replicate the same functionality in pandas elegantly? It would have been great if we can specify how="left-right" in merge().
-
Nickpick almost 8 yearsThe correct way to filter a dataframe is as follows, you need to correct the second line of your code: df=df[df['_merge']=='left_only']
-
GeorgeOfTheRF almost 8 yearsThanks for pointing out the typo. I have fixed it as per your comment.
-
Sn3akyP3t3 over 3 yearsI found this link to be extremely helpful for this and other scenarios! kanoki.org/2019/07/04/pandas-difference-between-two-dataframes