R - Extract info after nth occurrence of a character from the right of string

12,015

Solution 1

You could use

([^-]+)(?:-[^-]+){3}$

See a demo on regex101.com.


In R this could be
library(dplyr)
library(stringr)
df <- data.frame(string = c('here-are-some-words-to-try', 'a-b-c-d-e-f-g-h-i', ' no dash in here'), stringsAsFactors = FALSE)

df <- df %>%
  mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2])
df

And yields

                      string outcome
1 here-are-some-words-to-try    some
2          a-b-c-d-e-f-g-h-i       f
3            no dash in here    <NA>

Solution 2

x = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")
sapply(x, function(strings){
    ind = unlist(gregexpr(pattern = "-", text = strings))
    if (length(ind) < 4){NA}
    else{substr(strings, ind[length(ind) - 3] + 1, ind[length(ind) - 2] - 1)}
})
#here-are-some-words-to-try          a-b-c-d-e-f-g-h-i 
#                    "some"                        "f" 

Solution 3

How about splitting your sentence ? Something like

string <- "here-are-some-words-to-try"

# separate all words
val <- strsplit(string, "-")[[1]]

# reverse the order
val rev(val)

# take the 4th element
val[4]

# And using a dataframe
library(tidyverse)
tibble(string = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")) %>% 
mutate(outcome = map_chr(string, function(s) rev(strsplit(s, "-")[[1]])[4]))
Share:
12,015

Related videos on Youtube

alexb523
Author by

alexb523

Hi, I’m Alex! I am an experienced data analyst passionate about making value added decisions by transforming complex data into an understandable and usable format. I am looking to obtain a position which enables the utilization of skills and qualities in order to achieve excellence in automation, visualization, and predictive analytics. Throughout my career and education I've loved working with data. The skills I've learned to work with data include Python, R, SQL, AWS (Glue, Athena, Redshift, etc.), Tableau, Excel and I'm continually learning and trying to improve my skills in these programs and learn other programs. Besides working with data (although coding has become a bit of a hobby :)), I enjoy biking, hiking, kayaking, running, playing guitar, and other random activities like cooking and woodworking. I also like hanging out and traveling with my wife which I do most of those activities with. Cheers, Alex

Updated on November 02, 2022

Comments

  • alexb523
    alexb523 over 1 year

    I've seen many iterations of extracting w/ gsub but they mostly deal with extracting from left to right or after one occurrence. I am wanting to match from right to left, counting four occurrences of -, matching everything between the 3rd and 4th occurrence.

    For example:

    string                       outcome
    here-are-some-words-to-try   some
    a-b-c-d-e-f-g-h-i            f
    

    Here are a few references I've tried using:

  • Frank
    Frank over 6 years
    Errors on input with too few dashes. Probably should give NA instead, but that could be left up to the OP/user, I guess.