merge multiple dataframes into one using pandas
For your sample data you can achieve what you want by performing concat
twice, this assumes that the last 2 dfs align with the master df. The inner concat
concatenates the 2 supplemnentary dfs into a single df row-wise, the outer concat
concatenates column-wise:
In [56]:
# this concats the 2 supplementary dfs row-wise into a single df
pd.concat([df1,df2], ignore_index=True)
Out[56]:
date price
0 20141001 2.0
1 20141101 2.5
2 20141201 3.0
3 20141001 1.5
4 20141101 2.0
5 20141201 2.0
In [54]:
# now concat column-wise with the main df
pd.concat([df,pd.concat([df1,df2], ignore_index=True)], axis=1)
Out[54]:
fruit date volume date price
0 apple 20141001 2000 20141001 2.0
1 apple 20141101 1800 20141101 2.5
2 apple 20141201 2200 20141201 3.0
3 orange 20141001 1900 20141001 1.5
4 orange 20141101 2000 20141101 2.0
5 orange 20141201 3000 20141201 2.0
However, for your real data what you will need to do is to add the price column for each fruit:
In [55]:
df[df['fruit'] == 'apple'].merge(df1, on='date')
Out[55]:
fruit date volume price
0 apple 20141001 2000 2.0
1 apple 20141101 1800 2.5
2 apple 20141201 2200 3.0
and repeat again for each fruit
An approach to your real data problem would be to add a 'fruit' column to each supplemental df, concatenate all these and then merge back using 'fruit' and 'date' columns as the keys:
In [57]:
df1['fruit'] = 'apple'
df2['fruit'] = 'orange'
fruit_df = pd.concat([df1,df2], ignore_index=True)
fruit_df
Out[57]:
date price fruit
0 20141001 2.0 apple
1 20141101 2.5 apple
2 20141201 3.0 apple
3 20141001 1.5 orange
4 20141101 2.0 orange
5 20141201 2.0 orange
In [58]:
df.merge(fruit_df, on=['fruit', 'date'])
Out[58]:
fruit date volume price
0 apple 20141001 2000 2.0
1 apple 20141101 1800 2.5
2 apple 20141201 2200 3.0
3 orange 20141001 1900 1.5
4 orange 20141101 2000 2.0
5 orange 20141201 3000 2.0
user6396
Updated on December 21, 2020Comments
-
user6396 over 3 years
I have one data frame df:
fruit date volume 0 apple 20141001 2000 1 apple 20141101 1800 2 apple 20141201 2200 3 orange 20141001 1900 4 orange 20141101 2000 5 orange 20141201 3000 ….
and I have following two data frames
apple:
date price 0 20141001 2 1 20141101 2.5 2 20141201 3
orange:
date price 0 20141001 1.5 1 20141101 2 2 20141201 2
how can I merge all these in to the following data frame:
fruit date price volume 0 apple 20141001 2 2000 1 apple 20141101 2.5 1800 2 apple 20141201 3 2200 3 orange 20141001 1.5 1900 4 orange 20141101 2 2000 5 orange 20141201 2 3000 ….
This is just a example, in my real work, I have hundreds of 'fruit' with price data need to be merged into the first data frame.
should I use merge or join? what is the difference between them? Thank you.
-
EdChum about 9 years
join
usesmerge
underneath, as a rule of thumb if you are going to join on indices then usejoin
if not then usemerge
-
-
user6396 about 9 yearsIs it possible to use groupby?
-
EdChum about 9 years@user6396 you could groupby the fruit and date and then set the price, your problem though is that if you have lots of dfs with no identifier in them (i.e. a fruit column) then you can't merge them back without setting all the rows that match the dates with the same price, What you could do is add the fruit column for each supplementary df, concatenate all these supplementary dfs and then you can merge them back using the fruit column and date column, that would be how I would approach this