Match string in between two strings

15,534

Solution 1

You can use the regex

play\s*(.*?)\s*in

  1. Use the / as delimiters for regex literal syntax
  2. Use the lazy group to match minimal possible

Demo:

var str = "play the Ukulele in Lebanon. play the Guitar in Lebanon.";
var regex = /play\s*(.*?)\s*in/g;

var matches = [];
while (m = regex.exec(str)) {
  matches.push(m[1]);
}

document.body.innerHTML = '<pre>' + JSON.stringify(matches, 0, 4) + '</pre>';

Solution 2

You are so close to the right answer. There are a few things you may be overlooking:

  1. You need your match to be non-greedy, this can be accomplished by using the ? operator
  2. Do not use the String.match() method as it's proven to match the entirety of the pattern and does not pay attention to capturing groups as you would expect. An alternative is to use RegExp.exec() or String.replace(), but using replace would require a little more work, so stick to building your own array with exec

var str     = "display the Ukulele in Lebanon. play the Guitar in Lebanon.";
var re      = /\bplay (.+?) in\b/g;
var matches = [];
var match;

while ( match = re.exec(str) ){
  matches[ matches.length ] = match[1];
}


document.getElementById('demo').innerHTML = JSON.stringify( matches );
<pre id="demo"></pre>

Solution 3

/\bplay\s+(.+?)\s+in\b/ig might be more specific and might work better for you.

I believe there may be some issues with the regexes offered previously. For instance, /play\s*(.*?)\s*in/g will find a match within "displaying photographs in sequence". Of course this is not what you want. One of the problems is that there is nothing specifying that "play" should be a discrete word. It needs a word boundary before it and at least one instance of white space after it (it can't be optional). Similarly, the white space after the capture group should not be optional.

The other expression offered at the time I added this, /play (.+?) in/g, lacks the word boundary token before "play" and after "in", so it will contain a match in "display blue ink". This is not what you want.

As to your expression, it was missing the word boundary and white space tokens as well. But as another mentioned, it also needed the wildcard to be lazy. Otherwise, given your example string, your match would start with the first instance of "play" and end with the 2nd instance of "in".

If issues with my offered expression are found, would appreciate feedback.

Share:
15,534
MarksCode
Author by

MarksCode

Student

Updated on June 26, 2022

Comments

  • MarksCode
    MarksCode almost 2 years

    If I have a string like this:

    var str = "play the Ukulele in Lebanon. play the Guitar in Lebanon.";
    

    I want to get the strings between each of the substrings "play" and "in", so basically an array with "the Ukelele" and "the Guitar".

    Right now I'm doing:

    var test = str.match("play(.*)in");
    

    But that's returning the string between the first "play" and last "in", so I get "the Ukulele in Lebanon. Play the Guitar" instead of 2 separate strings. Does anyone know how to globally search a string for all occurrences of a substring between a starting and ending string?

  • MarksCode
    MarksCode about 8 years
    Thankyou sir, this is an excellent answer. Another user gave me the regex of /play\s*(.*?)\s*in/g but yours looks much simpler. The syntax looks pretty messy so I'm still trying to understand it.
  • vol7ron
    vol7ron about 8 years
    I was busy typing and din't notice that @Tushar came to almost the same answer, except for the value assignment to the array. In JavaScript you can use ` , \s, or \ ` to all reference a space. Just be careful elsewhere, like Perl, where the ` ` could be ignored. Also \s refers to more than just whitespace, it could mean a tab or a newline character.
  • Jon
    Jon about 8 years
    @vol7ron: I found some possible issues in your expression. I referenced it in my answer.
  • Jon
    Jon about 8 years
    I found some possible issues in your expression. I referenced it in my answer.
  • vol7ron
    vol7ron about 8 years
    @Jon thanks, you're correct, this could use word boundaries. Keep in mind that even word boundaries could have issues with hyphenations. The most robust solution would require many more lines of logic - or a negative lookbehind (which I don't think ECMAScript RegEx permits). So this also requires the OP to be more specific about the string(s) being evaluated. That said, the \b would be a good thing to include.
  • Jon
    Jon about 8 years
    @vol7ron: Yes, \b can have issues with many special characters. It's possible that I'm making more of this than needs to be as the string the OP is dealing with may vary little from what he provided above, in which case \b would be unnecessary. Also, his question may have been really just about greedy vs lazy. But I suppose that while the issue of potential problems with \b has been raised (and as you alluded, without knowing more about possible variation in his input string), maybe the following would be safer: /(?:\s|^)play\s+(.+?)\s+in\s/ig.
  • STEEL
    STEEL almost 6 years
    thanks this is perfect ;)