Regexmatch to find all string cells that match multiple words

15,851

Find 2 strings

Try:

=FILTER(A:A,REGEXMATCH(A:A,"(?i)bob.*cat|cat.*bob"))

You don't need to use ArrayFormula because filter is array formula itself.

  • (?i) - to make search case insensitive
  • bob.*cat|cat.*bob - match "bob→cat" or "cat→bob"

Find multiple strings

There's more complex formula for more words to match then 2.

Suppose we have a list in column A:

Bob ate the dead cat
The cat ate live bob
No cat ate live dog
Bob is dead
Bob and the cat are alive
Cat is Bob
ate Cat bob

And need to find all matches of 3 words, put them in column C:

cat
ate
bob

The formula is:

=FILTER(A:A,MMULT(--REGEXMATCH(A:A, "(?i)"&TRANSPOSE(C1:C3)),ROW(INDIRECT("a1:a"&COUNTA(C1:C3)))^0)=COUNTA(C1:C3))

It uses RegexMatch of transposed list of words C1:C3, and then mmult function sums matches and =COUNTA(C1:C3) compares the number of matches with the number of words in list.

The result is:

Bob ate the dead cat
The cat ate live bob
ate Cat bob
Share:
15,851
DeeKay789
Author by

DeeKay789

Updated on June 28, 2022

Comments

  • DeeKay789
    DeeKay789 almost 2 years

    I'm using ArrayFormula and FILTER combination to list all cells in a column that contain all of the search term words. I'm using REGEXMATCH rather than QUERY/CONTAINS/LIKE because my FILTER has other criteria that return TRUE/FALSE.

    My problem seems to be precedence. So the following regex works in a limited way.

    =ArrayFormula(filter(A1:A5,regexmatch(A1:A5,"(?i)^"&"(.*?\bbob\b)(.*?\bcat\b)"&".*$")))
    

    It will find Bob and cat but only if Bob precedes cat.

    Google sheets fails if I try to use lookahead ?= ie

    =ArrayFormula(filter(A1:A5,regexmatch(A1:A5,"(?i)^"&"(?=.*?\bbob\b)(?=.*?\bcat\b)"&".*$")))
    

    I don't want to use the '|' alternation in the string (repeat and reverse) as the input words may be many more than two so alternation becomes exponentially more complex.

    Here's the test search array (each row is a single cell containing a string)...

    Bob ate the dead cat
    The cat ate live bob
    No cat ate live dog
    Bob is dead
    Bob and the cat are alive
    

    ... and the desired results I'm after.

    Bob ate the dead cat
    The cat ate live bob
    Bob and the cat are alive
    

    Once I have the regex sorted out, the final solution will be a user input text box where they simply enter the words that must be found in a string ie 'Bob cat'. This input string I think I can unpack into its separate words and concatenate to the above expression, however, if there's a 'best practice' way of doing this I'd like to hear.