How to merge multiple data frames based on two columns?
Solution 1
For matching, maybe an inner_join from dplyr?
library(dplyr)
df1 <- data.frame(
lat = c(-33.9174, -33.9175, -33.9176, -33.9177, -33.9171),
long = c(151.2263, 151.2264, 151.2265, 151.2266, -140.54),
PM = c(8, 10, 9, 8, 55)
)
df2 <- data.frame(
lat = c(-33.9174, -33.9175, -33.9176, -33.9177, -31),
long = c(151.2263, 151.2264, 151.2265, 151.2266, 134),
PM = c(12, 15, 11, 3, 18)
)
library(dplyr)
inner_join(df1, df2, by = c("lat", "long"))
lat long PM.x PM.y
1 -33.9174 151.2263 8 12
2 -33.9175 151.2264 10 15
3 -33.9176 151.2265 9 11
4 -33.9177 151.2266 8 3
Solution 2
here might be an answer, though it's a bit verbose and wouldn't be great for large numbers of dataframes:
library(tidyverse)
feb_13 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177),
long = c(151.2263, 151.2264,151.2265,151.2266),
pm = c(8,10,9,8))
feb_14 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177),
long = c(151.2263, 151.2264,151.2265,151.2266),
pm = c(7,3,4,5))
feb_15 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177),
long = c(151.2263, 151.2264,151.2265,151.2266),
pm = c(1,4,10,12))
This is the first technique. Simple, but taking the mean is ugly here...
df <- left_join(feb_13, feb_14, by = c("lat", "long")) %>%
left_join(feb_15, by = c("lat", "long")) %>%
rename(
pm_feb13 = pm.x,
pm_feb14 = pm.y,
pm_feb15 = pm
) %>%
mutate(
mean = c((pm_feb13[1] + pm_feb14[1] + pm_feb15[1])/3,
(pm_feb13[2] + pm_feb14[2] + pm_feb15[2])/3,
(pm_feb13[3] + pm_feb14[3] + pm_feb15[3])/3,
(pm_feb13[4] + pm_feb14[4] + pm_feb15[4])/3)
)
Here is the second option, which has a lot of pipes, but utilizes summarize
df_2 <- left_join(feb_13, feb_14, by = c("lat", "long")) %>%
left_join(feb_15, by = c("lat", "long")) %>%
group_by(lat, long) %>%
summarise(
mean = mean(c(pm.x, pm.y, pm), na.rm=T)
) %>%
full_join(feb_13, by = c("lat", "long")) %>%
full_join(feb_14, by = c("lat", "long")) %>%
full_join(feb_15, by = c("lat", "long")) %>%
rename(
pm_feb13 = pm.x,
pm_feb14 = pm.y,
pm_feb15 = pm
) %>%
arrange(long)
Related videos on Youtube
Imogen
Updated on June 04, 2022Comments
-
Imogen almost 2 years
I have multiple data frames for data collected over 4 days. Each of the data frames looks like this (put very simply):
Lat Long PM -33.9174 151.2263 8 -33.9175 151.2264 10 -33.9176 151.2265 9 -33.9177 151.2266 8
I want to merge multiple data frames based on their matching Long and Lat values, to average out all 'PM' values at a particular location. The end result will look something like this (for the 13th - 16th Feb):
Lat Long PM.13th Feb PM.14th Feb PM.15th Feb **Mean** -33.9174 151.2263 8 9 11 9.33 -33.9175 151.2264 10 11 12 11 -33.9176 151.2265 9 14 13 12 -33.9177 151.2266 8 10 11 9.66
I understand that merging 2 data frames is easy enough:
df = merge(data1, data2, by.x = c("Lat", "Long"), by.y = c("Lat", "Long"))
But how do I merge multiple dataframes based on matching Longitude and Latitude values?
Also, is there a way I can filter the data so that it will match up data which is within 0.001 Lat/Long value of each other? (Currently I am rounding the Lat/Long data to 3 decimal places, but it is duplicating my data).
-
Lamia over 6 yearsCheck this question stackoverflow.com/questions/8091303/…
-
-
Imogen over 6 yearsHi thanks for the reply! I'm just wondering how I would join multiple frames together? Doing this inner_join means that it categorises each new data frame into either an x or y, so it has troubles joining many dataframes together. Any thoughts? Thanks!
-
Imogen over 6 yearsThanks, I'll give it a go...although I have about 300 rows in each of the data frames! Let you know how I get on :)
-
Nick over 6 yearsHey Imogen, how did it go?