Concatenate row-wise across specific columns of dataframe
66,633
Solution 1
Try
data$id <- paste(data$F, data$E, data$D, data$C, sep="_")
instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.
Edit Even better is
data <- within(data, id <- paste(F, E, D, C, sep=""))
Solution 2
Use unite
of tidyr
package:
require(tidyr)
data <- data %>% unite(id, F, E, D, C, sep = '_')
First parameter is the desired name, all next up to sep
- columns to concatenate.
Solution 3
Either stringr::str_c()
or paste()
will work.
require(stringr)
data <- within(data, str_c(F,E,D,C, sep="_")
or else
data <- within(data, paste(F,E,D,C, sep="_")
(stringr
is better performance on large datasets)
Author by
Jubbles
Updated on June 24, 2020Comments
-
Jubbles almost 4 years
I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.
> str(data) 'data.frame': 680420 obs. of 10 variables: $ A : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ B : chr "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ... $ C : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ D : chr "AAA" "AAA" "BCB" "CCC" ... $ E : chr "A00001" "A00002" "B00002" "B00001" ... $ F : int 9 9 37 37 37 37 191 191 191 191 ... $ G : int NA NA NA NA NA NA NA NA NA NA ... $ H : int 4 4 4 4 4 4 4 4 4 4 ...
For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:
data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep="_")
And below is the undesired result:
> str(data) 'data.frame': 680420 obs. of 10 variables: $ A : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ B : chr "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ... $ C : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ D : chr "AAA" "AAA" "BCB" "CCC" ... $ E : chr "A00001" "A00002" "B00002" "B00001" ... $ F : int 9 9 37 37 37 37 191 191 191 191 ... $ G : int NA NA NA NA NA NA NA NA NA NA ... $ H : int 4 4 4 4 4 4 4 4 4 4 ... $ id : chr [1:680420, 1:4] "9" "9" "37" "37" ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr "V1" "V2" "V3" "V4"
Any help would be greatly appreciated.
-
Jubbles almost 13 yearsWow. That was quick and correct. It looks like I was over-complicating the process. I am really trying to wean myself off of complete 'loop dependency' and am still learning to appreciate vectorized code.
-
Richie Cotton almost 13 yearsAnd try
with(data, paste(...))
for even more succinctness. -
Dirk Eddelbuettel almost 13 yearsCorrect. Or even
within()
as the less-loved cousin ofwith()
. Amended my answer accordingly. -
smci about 10 yearsSo that's how
within
differs!?! (Why the hell wasn't it namedwith(..., modify=TRUE)
?) -
smci about 10 years@the serial downvoter:
stringr::str_c
can be used and noone said that. What's your problem? -
jessi over 7 yearsWhen I needed to collapse dynamic vectors by groups, this method was the only one that consistently worked as such
mycols <- c("F","E","D","C") data <- data %>% unite_("id", mycols, sep = "_")
-
Parsa almost 7 yearsWhat if you use
data$id <- paste(data[, c(1)], data[,c(2)])
oddly then the code doesn't do what expected. Is there something I'm missing here? -
Jinhua Wang over 4 yearsThis answer looks awesome
-
Josiah Yoder almost 4 yearsIf you want different separators between each column, you can use
paste0
:paste0(df$col1,'-',df$col2,'.txt')
. paste0 is a convenience function for specifying there is no separator.