Split column into multiple columns R
Solution 1
Another solution with str_match
from the stringr
package:
x <- c("I:500-600", "I:700-900", "II:200-250")
library(stringr)
as.data.frame(str_match(x, "^(.*):(.*)-(.*)$")[,-1])
## V1 V2 V3
## 1 I 500 600
## 2 I 700 900
## 3 II 200 250
In the above regular expression we match 3 substrings: from the beginning to :
, from :
to -
, and from -
to the end. Each matched substring will constitute a separate column in the resulting object.
Solution 2
You can use strsplit
with an OR argument splitting using :
or -
this will give you a list which you can process further.
> test <- c('I:500-600', 'I:700-900', 'II:200-250')
> do.call(rbind.data.frame, strsplit(test, ":|-"))
c..I....I....II.. c..500....700....200.. c..600....900....250..
1 I 500 600
2 I 700 900
3 II 200 250
If names are important
> as.data.frame(do.call(rbind, strsplit(test, ":|-")))
V1 V2 V3
1 I 500 600
2 I 700 900
3 II 200 250
Solution 3
Other options include extract
from tidyr
library(tidyr)
extract(df1, V1, into=c('V1','V2', 'V3'),
'([^:]*):([0-9]*)-([0-9]*)', convert=TRUE)
# V1 V2 V3
#1 I 500 600
#2 I 700 900
#3 II 200 250
Or tstrsplit
from data.table
.
library(data.table)#v1.9.5+
setDT(df1)[, tstrsplit(V1, '[:-]', type.convert=TRUE)]
# V1 V2 V3
#1: I 500 600
#2: I 700 900
#3: II 200 250
NOTE: Both options have arguments to convert the class of the output columns
data
df1 <- structure(list(V1 = c("I:500-600", "I:700-900", "II:200-250")),
.Names = "V1", class = "data.frame", row.names = c(NA, -3L))
Solution 4
I would recommend cSplit
from my "splitstackshape" package.
The syntax is pretty straightforward: cSplit(yourInputDataFrame, yourSplittingColumn, theDelimiters)
.
Here's an example on a vector
. You'd skip the data.table
part if you already had a data.frame
or a data.table
.
library(splitstackshape)
cSplit(data.table(x), "x", ":|-", fixed = FALSE)
# x_1 x_2 x_3
# 1: I 500 600
# 2: I 700 900
# 3: II 200 250
By default, it also runs type.convert
:
str(.Last.value)
# Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
# $ x_1: Factor w/ 2 levels "I","II": 1 1 2
# $ x_2: int 500 700 200
# $ x_3: int 600 900 250
# - attr(*, ".internal.selfref")=<externalptr>
user3272284
Updated on June 09, 2022Comments
-
user3272284 almost 2 years
I have a data frame column that I need to split into 3 separate column. Looks like this:
I:500-600 I:700-900 II:200-250
I'd like to split this into the following 3 columns:
V1 V2 V3 I 500 600 I 700 900 II 200 250
This has proved slightly trickier than I had hoped. Any help would be appreciated.