Regular expression to get a string between two strings in Javascript

549,046

Solution 1

A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).

You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):

cow(.*)milk

No lookaheads are needed at all.

Solution 2

Regular expression to get a string between two strings in JavaScript

The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.

ECMAScript 2018 and newer compatible solution

In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like

var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional

In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).

Scenario 1: Single-line input

This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.

cow (.*?) milk

cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).

Scenario 2: Multiline input

cow ([\s\S]*?) milk

Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.

Scenario 3: Overlapping matches

If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):

/>>>\d+\s(.*?)(?=>>>)/g

See the online regex demo yielding text1 and text2 as Group 1 contents found.

Also see How to get all possible overlapping matches for a string.

Performance considerations

Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:

/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm

See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).

Sample regex usage in JavaScript:

//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
  result.push(m[1]);
}
console.log(result);

Using the modern String#matchAll method

const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));

Solution 3

Here's a regex which will grab what's between cow and milk (without leading/trailing space):

srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");

An example: http://jsfiddle.net/entropo/tkP74/

Solution 4

  • You need capture the .*
  • You can (but don't have to) make the .* nongreedy
  • There's really no need for the lookahead.

    > /cow(.*?)milk/i.exec('My cow always gives milk');
    ["cow always gives milk", " always gives "]
    

Solution 5

The chosen answer didn't work for me...hmm...

Just add space after cow and/or before milk to trim spaces from " always gives "

/(?<=cow ).*(?= milk)/

enter image description here

Share:
549,046

Related videos on Youtube

phil
Author by

phil

Updated on January 13, 2021

Comments

  • phil
    phil over 3 years

    I have found very similar posts, but I can't quite get my regular expression right here.

    I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".

    My cow always gives milk

    would return

    "always gives"

    Here is the expression I have pieced together so far:

    (?=cow).*(?=milk)
    

    However, this returns the string "cow always gives".

    • Salketer
      Salketer about 11 years
      I stumbled on this old question and wanted to clarify why testRE is an array. test.match returns an array with first index as the total match (therfor, the string that matches cow(.*)milk) and then, all the trapped strings like the (.*) if there was a second set of parenthesis they would then be in testRE[2]
    • Michael.Lumley
      Michael.Lumley over 9 years
      This solution will not work if you are searching over a string containing newlines. In such a case, you should use "STRING_ONE([\\s\\S]*?)STRING_TWO". stackoverflow.com/questions/22531252/…
    • vzR
      vzR about 7 years
      just for reference the match method on MDN developer.mozilla.org/en/docs/Web/JavaScript/Reference/…
  • Ben
    Ben about 13 years
    In this particular instance, if it were greedy it would reach the end and backtrack (presumably).
  • Mosca Pt
    Mosca Pt over 6 years
    Thanks, I added a fiddle (jsfiddle.net/MoscaPt/g5Lngjx8/2) for it. /Johan
  • TheCascadian
    TheCascadian almost 6 years
    When I test this, the provided Regex expression includes both "cow" and "milk"...
  • Rory O'Kane
    Rory O'Kane almost 6 years
    This is missing a step. When you get the result of the match, you need to extract the matched text of the first capturing group with matched[1], not the whole matched text with matched[0].
  • Mark Carpenter Jr
    Mark Carpenter Jr over 5 years
    Look Behind ?<= is not supported in Javascript. Would be the way to do it though.
  • Mark Carpenter Jr
    Mark Carpenter Jr over 5 years
    Look Behind ?<= is not supported in Javascript.
  • duduwe
    duduwe over 5 years
    @MarkCarpenterJr if you tested it via regextester.com, you will get that hint. It seems that the site has based its rules from the older specification. Lookbehind is now supported. See stackoverflow.com/questions/30118815/… And the pattern works well with modern browsers without error. Try this checker instead regex101.com
  • Qian Chen
    Qian Chen over 5 years
    In Javascript, you actually need to use ([\s\S]*?) rather than (.*?).
  • Almir Campos
    Almir Campos about 5 years
    Although this is a useful techique, it was downvoted because IMHO this is NOT the right answer for the question, since it includes "cow" and "milk", as stated by @TheCascadian
  • Andrew Irwin
    Andrew Irwin about 4 years
    Works for me! fantastic answer because it's just really simple! :)
  • Paul Strupeikis
    Paul Strupeikis about 4 years
    It is supported in JavaScript. It's not supported in Safari and Mozilla (yet), only in Chrome and Opera.
  • sborn
    sborn almost 4 years
    @AlmirCampos - if I am not mistaken there is no way to do this match without matching "cow" and "milk" (since you want to match what's in between those two). The problem is not in the RegEx itself but how you handle it afterwards (as mentioned by Rory O'Kane). Otherwise you could only match for surrounding spaces - and that would give you a VERY wrong return, wouldn't it?
  • Almir Campos
    Almir Campos almost 4 years
    @sborn - Thanks for pointing this out. I think the question gives room for interpretations. What I have in mind is a (vanilla - as much as possible) regex that filters the original message and provides the result asked. It would be the case of this regex: /([^(my cow)])(.*)[^(milk)]/g Please, check the fiddle at jsfiddle.net/almircampos/4L2wam0u/5 and let us know your thoughts.
  • Shailesh
    Shailesh over 3 years
    It misses two edge cases. 1. If start is missing from main string then it will throw exception. 2. If end is missing from main string then it will still gives the result back which would be wrong match.
  • Tigerrrrr
    Tigerrrrr over 3 years
    Just in case others are reading this, THIS RETURNS "cow always gives milk" NOT "always gives". Now read, sborn's comment.
  • Wiktor Stribiżew
    Wiktor Stribiżew about 3 years
    I have written a general article about extracting strings between two strings with regex, too, feel free to read if you have a problem approaching your current similar problem.
  • NetOperator Wibby
    NetOperator Wibby almost 2 years
    This is now supported in Firefox.