Concatenate row-wise across specific columns of dataframe

66,633

Solution 1

Try

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_")

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

Edit Even better is

 data <- within(data,  id <- paste(F, E, D, C, sep=""))

Solution 2

Use unite of tidyr package:

require(tidyr)
data <- data %>% unite(id, F, E, D, C, sep = '_')

First parameter is the desired name, all next up to sep - columns to concatenate.

Solution 3

Either stringr::str_c() or paste() will work.

require(stringr)
data <- within(data, str_c(F,E,D,C, sep="_")

or else

data <- within(data, paste(F,E,D,C, sep="_")

(stringr is better performance on large datasets)

Share:
66,633
Jubbles
Author by

Jubbles

Updated on June 24, 2020

Comments

  • Jubbles
    Jubbles almost 4 years

    I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.

    > str(data)
    'data.frame':   680420 obs. of  10 variables:
     $ A              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
     $ B              : chr  "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...
     $ C              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
     $ D              : chr  "AAA" "AAA" "BCB" "CCC" ...
     $ E              : chr  "A00001" "A00002" "B00002" "B00001" ...
     $ F              : int  9 9 37 37 37 37 191 191 191 191 ...
     $ G              : int  NA NA NA NA NA NA NA NA NA NA ...
     $ H              : int  4 4 4 4 4 4 4 4 4 4 ...
    

    For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:

    data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep="_")
    

    And below is the undesired result:

      > str(data)
        'data.frame':   680420 obs. of  10 variables:
         $ A              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
         $ B              : chr  "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...
         $ C              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
         $ D              : chr  "AAA" "AAA" "BCB" "CCC" ...
         $ E              : chr  "A00001" "A00002" "B00002" "B00001" ...
         $ F              : int  9 9 37 37 37 37 191 191 191 191 ...
         $ G              : int  NA NA NA NA NA NA NA NA NA NA ...
         $ H              : int  4 4 4 4 4 4 4 4 4 4 ...
         $ id             : chr [1:680420, 1:4] "9" "9" "37" "37" ...
          ..- attr(*, "dimnames")=List of 2
          .. ..$ : NULL
          .. ..$ : chr  "V1" "V2" "V3" "V4"
    

    Any help would be greatly appreciated.

  • Jubbles
    Jubbles almost 13 years
    Wow. That was quick and correct. It looks like I was over-complicating the process. I am really trying to wean myself off of complete 'loop dependency' and am still learning to appreciate vectorized code.
  • Richie Cotton
    Richie Cotton almost 13 years
    And try with(data, paste(...)) for even more succinctness.
  • Dirk Eddelbuettel
    Dirk Eddelbuettel almost 13 years
    Correct. Or even within() as the less-loved cousin of with(). Amended my answer accordingly.
  • smci
    smci about 10 years
    So that's how within differs!?! (Why the hell wasn't it named with(..., modify=TRUE) ?)
  • smci
    smci about 10 years
    @the serial downvoter: stringr::str_c can be used and noone said that. What's your problem?
  • jessi
    jessi over 7 years
    When I needed to collapse dynamic vectors by groups, this method was the only one that consistently worked as such mycols <- c("F","E","D","C") data <- data %>% unite_("id", mycols, sep = "_")
  • Parsa
    Parsa almost 7 years
    What if you use data$id <- paste(data[, c(1)], data[,c(2)]) oddly then the code doesn't do what expected. Is there something I'm missing here?
  • Jinhua Wang
    Jinhua Wang over 4 years
    This answer looks awesome
  • Josiah Yoder
    Josiah Yoder almost 4 years
    If you want different separators between each column, you can use paste0: paste0(df$col1,'-',df$col2,'.txt'). paste0 is a convenience function for specifying there is no separator.