R split a character string on the second underscore

11,410

Solution 1

One way would be to replace the second underscore by another delimiter (i.e. space) using sub and then split using that.

Using sub, we match one or more characters that are not a _ from the beginning (^) of the string (^[^_]+) followed by the first underscore (_) followed by one or characters that are not a _ ([^_]+). We capture that as a group by placing it inside the parentheses ((....)), then we match the _ followed by one or more characters till the end of the string in the second capture group ((.*)$). In the replacement, we separate the first (\\1) and second (\\2) with a space.

strsplit(sub('(^[^_]+_[^_]+)_(.*)$', '\\1 \\2', v1), ' ')
#[[1]]
#[1] "c54254_g4545" "i5454"       

#[[2]]
#[1] "c434_g4" "i455"   

#[[3]]
#[1] "c5454_g544" "i3" 

data

v1 <- c('c54254_g4545_i5454', 'c434_g4_i455', 'c5454_g544_i3')

Solution 2

strsplit(sub("(_)(?=[^_]+$)", " ", x, perl=T), " ")
#[[1]]
#[1] "c54254_g4545" "i5454"       
#
#[[2]]
#[1] "c434_g4" "i455"   
#
#[[3]]
#[1] "c5454_g544" "i3"

With the pattern "(_)(?=[^_]+$)", we split on an underscore that comes before a series of one or more non-underscore characters. That way we only need one capture group.

Solution 3

I did this. However, although it works there may be a 'better' way?

str = 'c110478_g1_i1'

m = strsplit(str, '_')
f <- paste(m[[1]][1],m[[1]][2],sep='_')
Share:
11,410
SigneMaten
Author by

SigneMaten

Updated on July 24, 2022

Comments

  • SigneMaten
    SigneMaten almost 2 years

    I have character strings with two underscores. Like these

    c54254_g4545_i5454
    c434_g4_i455
    c5454_g544_i3
    .
    .
    etc
    

    I need to split these strings by the second underscore and I am afraid I have no clue how to do that in R (or any other tool for that sake). I'd be very happy if anyone can sort me out here. Thank you SM

  • akrun
    akrun over 8 years
    If this is not general, you don't need sub, strsplit(x, '(_)(?=[^_]+$)', perl=TRUE) as the pattern can be directly used to split. (+1).