How to merge multiple data frames based on two columns?

10,976

Solution 1

For matching, maybe an inner_join from dplyr?

library(dplyr)
df1 <- data.frame(
  lat = c(-33.9174, -33.9175, -33.9176, -33.9177, -33.9171), 
  long = c(151.2263, 151.2264, 151.2265, 151.2266, -140.54),
  PM = c(8, 10, 9, 8, 55)
)

df2 <- data.frame(
  lat = c(-33.9174, -33.9175, -33.9176, -33.9177, -31), 
  long = c(151.2263, 151.2264, 151.2265, 151.2266, 134),
  PM = c(12, 15, 11, 3, 18)
)

library(dplyr)

inner_join(df1, df2, by = c("lat", "long"))

       lat     long PM.x PM.y
1 -33.9174 151.2263    8   12
2 -33.9175 151.2264   10   15
3 -33.9176 151.2265    9   11
4 -33.9177 151.2266    8    3

Solution 2

here might be an answer, though it's a bit verbose and wouldn't be great for large numbers of dataframes:

library(tidyverse)
feb_13 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177), 
                 long = c(151.2263, 151.2264,151.2265,151.2266),
                 pm = c(8,10,9,8))

feb_14 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177), 
                 long = c(151.2263, 151.2264,151.2265,151.2266),
                 pm = c(7,3,4,5))

feb_15 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177), 
                 long = c(151.2263, 151.2264,151.2265,151.2266),
                 pm = c(1,4,10,12))

This is the first technique. Simple, but taking the mean is ugly here...

df <- left_join(feb_13, feb_14, by = c("lat", "long")) %>%
        left_join(feb_15, by = c("lat", "long")) %>%
        rename(
         pm_feb13 = pm.x,
         pm_feb14 = pm.y,
         pm_feb15 = pm
        ) %>%
        mutate(
         mean = c((pm_feb13[1] + pm_feb14[1] + pm_feb15[1])/3,
                  (pm_feb13[2] + pm_feb14[2] + pm_feb15[2])/3,
                  (pm_feb13[3] + pm_feb14[3] + pm_feb15[3])/3,
                  (pm_feb13[4] + pm_feb14[4] + pm_feb15[4])/3)
        )

Here is the second option, which has a lot of pipes, but utilizes summarize

df_2 <- left_join(feb_13, feb_14, by = c("lat", "long")) %>%
          left_join(feb_15, by = c("lat", "long")) %>%
          group_by(lat, long) %>%
          summarise(
            mean = mean(c(pm.x, pm.y, pm), na.rm=T)
          ) %>%
          full_join(feb_13, by = c("lat", "long")) %>%
          full_join(feb_14, by = c("lat", "long")) %>%
          full_join(feb_15, by = c("lat", "long")) %>%
          rename(
            pm_feb13 = pm.x,
            pm_feb14 = pm.y,
            pm_feb15 = pm
          ) %>%
          arrange(long)
Share:
10,976

Related videos on Youtube

Imogen
Author by

Imogen

Updated on June 04, 2022

Comments

  • Imogen
    Imogen almost 2 years

    I have multiple data frames for data collected over 4 days. Each of the data frames looks like this (put very simply):

    Lat           Long       PM
    -33.9174    151.2263     8
    -33.9175    151.2264     10 
    -33.9176    151.2265     9
    -33.9177    151.2266     8
    

    I want to merge multiple data frames based on their matching Long and Lat values, to average out all 'PM' values at a particular location. The end result will look something like this (for the 13th - 16th Feb):

    Lat         Long    PM.13th Feb  PM.14th Feb  PM.15th Feb   **Mean**
    -33.9174   151.2263     8            9           11         9.33
    -33.9175   151.2264     10           11          12          11
    -33.9176   151.2265     9            14          13          12
    -33.9177   151.2266     8            10          11         9.66
    

    I understand that merging 2 data frames is easy enough:

    df = merge(data1, data2, by.x = c("Lat", "Long"), by.y = c("Lat", "Long"))
    

    But how do I merge multiple dataframes based on matching Longitude and Latitude values?

    Also, is there a way I can filter the data so that it will match up data which is within 0.001 Lat/Long value of each other? (Currently I am rounding the Lat/Long data to 3 decimal places, but it is duplicating my data).

  • Imogen
    Imogen over 6 years
    Hi thanks for the reply! I'm just wondering how I would join multiple frames together? Doing this inner_join means that it categorises each new data frame into either an x or y, so it has troubles joining many dataframes together. Any thoughts? Thanks!
  • Imogen
    Imogen over 6 years
    Thanks, I'll give it a go...although I have about 300 rows in each of the data frames! Let you know how I get on :)
  • Nick
    Nick over 6 years
    Hey Imogen, how did it go?