regex - return all before the second occurrence

18,812

Solution 1

I think this might do the task (regex to match everything befor the last occurence of _):

_([^_]*)$

E.g.:

> sub('_([^_]*)$', '', "DNS000001320_309.0/121.0_t0")
[1] "DNS000001320_309.0/121.0"

Solution 2

The following script:

s <- "DNS000001320_309.0/121.0_t0"
t <- gsub("^([^_]*_[^_]*)_.*$", "\\1", s)
t

will print:

DNS000001320_309.0/121.0

A quick explanation of the regex:

^         # the start of the input
(         # start group 1
  [^_]*   #   zero or more chars other than `_`
  _       #   a literal `_`
  [^_]*   #   zero or more chars other than `_`
)         # end group 1
_         # a literal `_`
.*        # consume the rest of the string
$         # the end of the input

which is replaced with:

\\1       # whatever is matched in group 1

And if there are less than 2 underscores, the string is not changed.

Solution 3

Personally, I hate regex, so luckily there's a way to do this without them, just by splitting the string:

> s <- "DNS000001320_309.0/121.0_t0"      
> paste(strsplit(s,"_")[[1]][1:2],collapse = "_")
[1] "DNS000001320_309.0/121.0"

Although of course this assumes that there will always be at least 2 underscores in your string, so be careful if you vectorize this and that isn't the case.

Solution 4

not pretty but this will do the trick

mystr <- "DNS000001320_309.0/121.0_t0"

mytok <- paste(strsplit(mystr,"_")[[1]][1:2],collapse="_")
Share:
18,812

Related videos on Youtube

James
Author by

James

BY DAY: Information Management at Dart Neuroscience in San Diego, CA BY NIGHT: Husband and father FOR FUN: Film photography and alt printing processes

Updated on June 04, 2022

Comments

  • James
    James almost 2 years

    Given this string:

    DNS000001320_309.0/121.0_t0
    

    How would I return everything before the second occurrence of "_"?

    DNS000001320_309.0/121.0
    

    I am using R.

    Thanks.

  • Bart Kiers
    Bart Kiers over 12 years
    Yeah, if the regex-path is walked, sub would be more appropriate than gsub.
  • joran
    joran over 12 years
    If there are more than two underscores this will select beyond the second underscore, although apparently that doesn't matter to the OP, so I point it out only for posterity.
  • Bart Kiers
    Bart Kiers over 12 years
    Yeah, good point @joran. It may matter to the OP, but he might not be aware of it.
  • daroczig
    daroczig over 12 years
    Wow, really nice, detailed answer (+1). I definitely like your solution better than mine :)
  • Connor Murray
    Connor Murray over 4 years
    What if there are more than 2 underscores?? Is there a way to further generalize this function?