How to properly understand pandas dataframe merge (how, left_on, right_on)?

15,853

I suggest you to go through the documentation to understand the merging operation properly. It is well documented with examples. Counldn't think of much simpler explanation. Documentation for merging

From documentation

left_on: Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame

right_on: Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame

Why are the role and school columns always between location_y?

After merging columns will be sorted. To check that change a column name that starts with letter earlier than L in the second df parameter of pd.merge.

pd.merge(student_df, staff_df, how='left', left_on='Name', right_on='Name')
            Location_x   Name       School         Location_y            Role
0  1024 Billiard Avenue  James     Business  Washington Avenue          Grader
1  Fraternity House #22   Mike          Law                NaN             NaN
2   512 Wilson Crescent  Sally  Engineering  Washington Avenue  Course liasion

Example if Role is Bole

             Location_x   Name       School            Bole         Location_y
0  1024 Billiard Avenue  James     Business          Grader  Washington Avenue
1  Fraternity House #22   Mike          Law             NaN                NaN
2   512 Wilson Crescent  Sally  Engineering  Course liasion  Washington Avenue

Instead of left_on and right_on two parameters you can use on which will match the keys from both the dataframe. i.e

 pd.merge(student_df, staff_df, how='left', on='Name') 

When is the role column beside the name column and when is the school column beside the name column?

It depends on the priority of df you give. If your specify staff_df first then the columns will be concatenated column wise after the staff_df. So Role will be beside Name column. Similary if you specify student_df Student will be beside Name column.

Share:
15,853

Related videos on Youtube

Kane Chew
Author by

Kane Chew

BY DAY: Business Development Manager at Purdue Research Foundation BY NIGHT: Learning about Data Science and Machine Learning CURRENT LANGUAGE: Python LANGUAGE HISTORY: BASIC, C, C++, VBA, Java, JavaScript, Lisp, Scheme BOOKS RECOMMENDATION: Eloquent Javascript A Course in Machine Learning Data Science at the Command Line

Updated on June 04, 2022

Comments

  • Kane Chew
    Kane Chew almost 2 years

    I have been trying to wrap my head around merge for a while:

    I have the following dataframes:

    staff_df = pd.DataFrame([{'Name': 'Kelly', 'Role': 'Director of HR', 'Location': 'State Street'},
                             {'Name': 'Sally', 'Role': 'Course liasion', 'Location': 'Washington Avenue'},
                             {'Name': 'James', 'Role': 'Grader', 'Location': 'Washington Avenue'}])
    student_df = pd.DataFrame([{'Name': 'James', 'School': 'Business', 'Location': '1024 Billiard Avenue'},
                               {'Name': 'Mike', 'School': 'Law', 'Location': 'Fraternity House #22'},
                               {'Name': 'Sally', 'School': 'Engineering', 'Location': '512 Wilson Crescent'}])
    

    I understand that I can merge them in more ways than one:

    pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name')
    pd.merge(student_df, staff_df, how='left', left_on='Name', right_on='Name')
    pd.merge(staff_df, student_df, how='right', left_on='Name', right_on='Name')
    pd.merge(student_df, staff_df, how='right', left_on='Name', right_on='Name')
    

    Each produces a slightly different output. Can someone guide me on the proper way to understand how each output is constructed?

    Specifically,

    • Why are the role and school columns always between location_y?
    • When is the role column beside the name column and when is the school column beside the name column?

    I would hold off asking about using left_index and right_on in the same merge statement.

    Thanks.

    • Zero
      Zero over 6 years
    • Kane Chew
      Kane Chew over 6 years
      @Zero Thank you for the reference doc. I scrolled through it and I wasn't able to find the right example to help me with my question. If you could kindly give me some guidance on how to approach my question, it would be most appreciated. Thanks.