How to properly understand pandas dataframe merge (how, left_on, right_on)?
I suggest you to go through the documentation to understand the merging operation properly. It is well documented with examples. Counldn't think of much simpler explanation. Documentation for merging
From documentation
left_on: Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame
right_on: Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame
Why are the role and school columns always between location_y?
After merging columns will be sorted. To check that change a column name that starts with letter earlier than L
in the second df
parameter of pd.merge
.
pd.merge(student_df, staff_df, how='left', left_on='Name', right_on='Name')
Location_x Name School Location_y Role 0 1024 Billiard Avenue James Business Washington Avenue Grader 1 Fraternity House #22 Mike Law NaN NaN 2 512 Wilson Crescent Sally Engineering Washington Avenue Course liasion
Example if Role
is Bole
Location_x Name School Bole Location_y 0 1024 Billiard Avenue James Business Grader Washington Avenue 1 Fraternity House #22 Mike Law NaN NaN 2 512 Wilson Crescent Sally Engineering Course liasion Washington Avenue
Instead of left_on
and right_on
two parameters you can use on
which will match the keys from both the dataframe. i.e
pd.merge(student_df, staff_df, how='left', on='Name')
When is the role column beside the name column and when is the school column beside the name column?
It depends on the priority of df you give. If your specify staff_df
first then the columns will be concatenated column wise after the staff_df
. So Role
will be beside Name
column. Similary if you specify student_df
Student will be beside Name column.
Related videos on Youtube
Kane Chew
BY DAY: Business Development Manager at Purdue Research Foundation BY NIGHT: Learning about Data Science and Machine Learning CURRENT LANGUAGE: Python LANGUAGE HISTORY: BASIC, C, C++, VBA, Java, JavaScript, Lisp, Scheme BOOKS RECOMMENDATION: Eloquent Javascript A Course in Machine Learning Data Science at the Command Line
Updated on June 04, 2022Comments
-
Kane Chew almost 2 years
I have been trying to wrap my head around
merge
for a while:I have the following dataframes:
staff_df = pd.DataFrame([{'Name': 'Kelly', 'Role': 'Director of HR', 'Location': 'State Street'}, {'Name': 'Sally', 'Role': 'Course liasion', 'Location': 'Washington Avenue'}, {'Name': 'James', 'Role': 'Grader', 'Location': 'Washington Avenue'}]) student_df = pd.DataFrame([{'Name': 'James', 'School': 'Business', 'Location': '1024 Billiard Avenue'}, {'Name': 'Mike', 'School': 'Law', 'Location': 'Fraternity House #22'}, {'Name': 'Sally', 'School': 'Engineering', 'Location': '512 Wilson Crescent'}])
I understand that I can merge them in more ways than one:
pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name') pd.merge(student_df, staff_df, how='left', left_on='Name', right_on='Name') pd.merge(staff_df, student_df, how='right', left_on='Name', right_on='Name') pd.merge(student_df, staff_df, how='right', left_on='Name', right_on='Name')
Each produces a slightly different output. Can someone guide me on the proper way to understand how each output is constructed?
Specifically,
- Why are the role and school columns always between location_y?
- When is the role column beside the name column and when is the school column beside the name column?
I would hold off asking about using
left_index
andright_on
in the same merge statement.Thanks.
-
Zero over 6 yearsHave you read pandas.pydata.org/pandas-docs/stable/merging.html?
-
Kane Chew over 6 years@Zero Thank you for the reference doc. I scrolled through it and I wasn't able to find the right example to help me with my question. If you could kindly give me some guidance on how to approach my question, it would be most appreciated. Thanks.