Javascript Regex - Find all possible matches, even in already captured matches

37,720

Solution 1

Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.

var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
    matches.push(found[0]);
    reg.lastIndex -= found[0].split(':')[1].length;
}

console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]

Demo


As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:

reg.lastIndex = found.index+1;

Demo

The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]

Solution 2

You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:

var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];

while ((arr = regex.exec(input)) !== null) {
    results.push(arr[0] + arr[1]);
}

I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.

Actually, it is possible to abuse replace method to do achieve the same result:

var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];

input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
    results.push($0 + $1);
    return '';
});

However, since it is replace, it does extra useless replacement work.

Solution 3

Unfortunately, it's not quite as simple as a single string.match.

The reason is that you want overlapping matches, which the /g flag doesn't give you.

You could use lookahead:

var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;

But now you get:

string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]

The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.

You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:

// using re from above to get the overlapping matches

var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need

while ((m = re.exec(string)) !== null) {
  // m is a match object, which has the index of the current match
  matches.push(string.substring(m.index).match(re2)[0]);
}

matches == [
  "A1B1Y:A1B2Y", 
  "A1B2Y:A1B3Y", 
  "A1B5Y:A1B6Y", 
  "A1B6Y:A1B7Y", 
  "A1B9Y:A1B10Y", 
  "A1B10Y:A1B11Y"
];

Here's a fiddle of this in action. Open up the console to see the results

Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Share:
37,720
Vinnie Cent
Author by

Vinnie Cent

Updated on February 15, 2020

Comments

  • Vinnie Cent
    Vinnie Cent about 4 years

    I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.

    Variables:

    var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
    
    var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
    

    Code:

    var match = string.match(reg);
    

    All matched results I get:

    A1B1Y:A1B2Y
    A1B5Y:A1B6Y
    A1B9Y:A1B10Y
    

    Matched results I want:

    A1B1Y:A1B2Y
    A1B2Y:A1B3Y
    A1B5Y:A1B6Y
    A1B6Y:A1B7Y
    A1B9Y:A1B10Y
    A1B10Y:A1B11Y
    

    In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.

  • Bergi
    Bergi about 11 years
    Nice, that's much better than lookahead, capturing groups etc. Btw, reg.lastIndex = found.index+1; should be enough and makes it expression-agnostic
  • Fabrício Matté
    Fabrício Matté about 11 years
    @VinnieCent No problem. =] Tick the V below the up/down arrows to mark it as accepted if it worked for you. Oh thanks Bergi, wasn't aware of that property. x]
  • Jan
    Jan over 8 years
    I had to do reg.lastIndex = found.index+found[0].length; so it continues from the position right after the last match.
  • RobertG
    RobertG about 8 years
    Note to self: this will not work if the global ("g") flag is not set for the RegExp. (new RegExp("foo", "g") or /foo/g)