Want Regex to stop at first occurrence of "." and ";"

20,941

Solution 1

There are a few things you shouldn't really be doing with your regex, first off, as pointed out by Arnal Murali, you shouldn't be using a greedy regex but should use the lazy version:

/current\..*?scotland\./i

I think it is a general rule of regex to go for the lazy option first as it is more often what you want. Secondly, you don't really want to use . to match everything, since you do not want to allow this part of your regex to match either . or ; you can put those in a negative capture group to capture anything but them:

/current\.[^.]*?scotland\./i

and

/current\.[^;]*?india;/i

or cover both with:

/(current|past)\.[^.;]*?(india|scotland)[.;]/i

(obviously this might not be what you want to do, just including to demonstrate how to extend this)

This is also a good rule of thumb, if you're having trouble with a regex make any wildcards more specific (in this case changing from matching everything . to matching everything but . and ; with [^.;])

Solution 2

s = ""Current. time is six thirty at Scotland. Past. time..."
s.scan /[Current|Past]*\..*?[.|;]/i 

#=> ["Current. time is six thirty at Scotland.", "Past. time was five thirty at India;",...]

Solution 3

As Amal said, your pattern is greedy and you should append a ? to make it lazy. I'd use the following to get ONLY the first occurrence the string you ask for:

/^.*?current\..*?scotland\./i

And this to get every group following that pattern, taking into account ';' as well as '.':

/current\..*?scotland[.;]/i

This last one basically means: Find any occurrence of 'current' and stop when you reach the first 'scotland' followed by either a '.' or a ';'

Share:
20,941
Pramod Shinde
Author by

Pramod Shinde

Ruby on Rails developer

Updated on July 09, 2022

Comments

  • Pramod Shinde
    Pramod Shinde almost 2 years

    I am trying to extract sentence to from a paragraph, with pattern like

     Current. time is six thirty at Scotland. Past. time was five thirty at India; Current. time is five thirty at Scotland. Past. time was five thirty at Scotland. Current. time is five ten at Scotland.
    

    When I Use Regex as

    /current\..*scotland\./i
    

    This matches to all string

    Current. time is six thirty at Scotland. Past. time was six thirty at India; Current. time is five thirty at Scotland. Past. time was five thirty at Scotland. Current. time is five ten at Scotland.
    

    Instead I want to stop at first occurrence of "." to all capture groups like

     Current. time is six thirty at Scotland.
     Current. time is five ten at Scotland. 
    

    Similarly for text like

     Past. time was five thirty at India; Current. time is six thirty at Scotland. Past. time was five thirty at Scotland. Past. time was five ten at India;    
    

    When I Use Regex Like

     /past\..*india\;/i
    

    This matches will whole string

     Past. time was five thirty at India; Current. time is six thirty at Scotland. Past. time was five thirty at Scotland. Past. time was five ten at India; 
    

    Here I want to capture all groups or first group like following, and How to stop at first occurrence of ";"

    Past. time was five thirty at India; 
    Past. time was five ten at India; 
    

    How I can make regular expression to stop at "," or ";" with above examples?

  • AdrianHHH
    AdrianHHH almost 10 years
    No need to escape the . and ; within a character class. So /current\..*?scotland[.;]/i would suffice.
  • sprk1
    sprk1 almost 10 years
    You're correct. I was just being extra verbose early in the morning :) Edited.
  • zx81
    zx81 almost 10 years
    Nicely written answer +1 :)
  • Mike H-R
    Mike H-R almost 10 years
    @zx81 Thanks. always nice to hear. :)