Combining two dataframes keeping all columns
Solution 1
Following David Arenberg's comment above...
Example <- merge(df1, df2, by = "col1", all = TRUE)
Solution 2
The solutions
Example <- merge(df1, df2, by = "col1", all = TRUE)`
and
Example <- join(df1,df2,by = "col1", type = "full")
give the same result, both with a number of NA's:
#> Example
# col1 col2 col3
#1 ab 1 5
#2 bc 2 <NA>
#3 cd 3 <NA>
#4 de 4 <NA>
#5 ef <NA> 6
#6 fg <NA> 7
#7 gh <NA> 8
One possibility to replace those entries with zeros is to convert the data frame into a matrix, change the entries, and convert back to a data frame:
Example <- as.matrix(Example)
Example[is.na(Example)] <- 0
Example <- as.data.frame(Example)
#> Example
# col1 col2 col3
#1 ab 1 5
#2 bc 2 0
#3 cd 3 0
#4 de 4 0
#5 ef 0 6
#6 fg 0 7
#7 gh 0 8
PS: I'm almost certain that @akrun knows another way to achieve this in a single line ;)
James White
Updated on June 04, 2022Comments
-
James White almost 2 years
What I would like to do is combine 2 dataframes, keeping all columns (which is not done in the example below) and input zeros where there are gaps in the dataframe from uncommon variables.
This seems like a plyr or dplyr theme. However, a full join in plyr does not keep all of the columns, whilst a left or a right join does not keep all the rows I desire. Looking at the dplyr cheatsheet (http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf), a full_join seems to be the function I need, but R does not recognise this function after succesfully loading the package.
As an example:
col1 <- c("ab","bc","cd","de") col2 <- c(1,2,3,4) df1 <- as.data.frame(cbind(col1,col2)) col1 <- c("ab","ef","fg","gh") col3 <- c(5,6,7,8) df2 <- as.data.frame(cbind(col1,col3)) library(plyr) Example <- join(df1,df2,by = "col1", type = "full") #Does not keep col3 library(dplyr) Example <- full_join(df1,df2,by = "col1") #Function not recognised
I would like the output...
col1 col2 col3 ab 1 5 bc 2 0 cd 3 0 de 4 0 ef 0 6 fg 0 7 gh 0 8
-
David Arenburg almost 9 years
full_join
works fine for me. As well asmerge(df1, df2, by = "col1", all = TRUE)
. Though your desired output is strange -
RHertel almost 9 yearsI think that line 6 of your code should read
df2 <- as.data.frame(cbind(col1,col3))
. ThenExample <- join(df1,df2,by = "col1", type = "full")
works fine, you may just need to replace the NAs with 0s. -
James White almost 9 yearsakrun I have now edited the code. This was a simplified version of my actual data and after the edit my predicament was the same. David perhaps I have an older version, in any case your merge solution worked perfectly thank you!
-
-
akrun almost 9 yearsAs the OP created 'factor' columns by
as.data.frame(cbind
, one possible option islibrary(car); Example[] <- lapply(Example, recode, 'NA=0')
-
David Arenburg almost 9 yearsNot sure what did you add to already commented/posted
merge
solution and to thefull_join
mentioned by the OP which also works. Usingplyr
instead ofdplyr
isn't an improvement. -
RHertel almost 9 yearsIt was just a minor change, replacing the NAs with zeros, according to the OP's requested output.
-
James White almost 9 yearsThank you for this answer. Yes the plyr option does work on this small example, but not on my actual dataset for some reason, I am not sure why as yet. The merge option worked perfectly though.
-
David Arenburg almost 9 years@James The
merge
function is already in your answer though. Also, update youdplyr
version andfull_join
should also work. -
James White almost 9 yearsThank you David, I will look into doing this.