R split a character string on the second underscore
Solution 1
One way would be to replace the second underscore by another delimiter (i.e. space) using sub
and then split using that.
Using sub
, we match one or more characters that are not a _
from the beginning (^
) of the string (^[^_]+
) followed by the first underscore (_
) followed by one or characters that are not a _
([^_]+
). We capture that as a group by placing it inside the parentheses ((....)
), then we match the _
followed by one or more characters till the end of the string in the second capture group ((.*)$
). In the replacement, we separate the first (\\1
) and second (\\2
) with a space.
strsplit(sub('(^[^_]+_[^_]+)_(.*)$', '\\1 \\2', v1), ' ')
#[[1]]
#[1] "c54254_g4545" "i5454"
#[[2]]
#[1] "c434_g4" "i455"
#[[3]]
#[1] "c5454_g544" "i3"
data
v1 <- c('c54254_g4545_i5454', 'c434_g4_i455', 'c5454_g544_i3')
Solution 2
strsplit(sub("(_)(?=[^_]+$)", " ", x, perl=T), " ")
#[[1]]
#[1] "c54254_g4545" "i5454"
#
#[[2]]
#[1] "c434_g4" "i455"
#
#[[3]]
#[1] "c5454_g544" "i3"
With the pattern "(_)(?=[^_]+$)"
, we split on an underscore that comes before a series of one or more non-underscore characters. That way we only need one capture group.
Solution 3
I did this. However, although it works there may be a 'better' way?
str = 'c110478_g1_i1'
m = strsplit(str, '_')
f <- paste(m[[1]][1],m[[1]][2],sep='_')
SigneMaten
Updated on July 24, 2022Comments
-
SigneMaten almost 2 years
I have character strings with two underscores. Like these
c54254_g4545_i5454 c434_g4_i455 c5454_g544_i3 . . etc
I need to split these strings by the second underscore and I am afraid I have no clue how to do that in R (or any other tool for that sake). I'd be very happy if anyone can sort me out here. Thank you SM
-
akrun over 8 yearsIf this is not general, you don't need
sub
,strsplit(x, '(_)(?=[^_]+$)', perl=TRUE)
as the pattern can be directly used to split. (+1).