Regexmatch to find all string cells that match multiple words
Find 2 strings
Try:
=FILTER(A:A,REGEXMATCH(A:A,"(?i)bob.*cat|cat.*bob"))
You don't need to use ArrayFormula
because filter is array formula itself.
(?i)
- to make search case insensitivebob.*cat|cat.*bob
- match "bob→cat" or "cat→bob"
Find multiple strings
There's more complex formula for more words to match then 2.
Suppose we have a list in column A:
Bob ate the dead cat
The cat ate live bob
No cat ate live dog
Bob is dead
Bob and the cat are alive
Cat is Bob
ate Cat bob
And need to find all matches of 3 words, put them in column C:
cat
ate
bob
The formula is:
=FILTER(A:A,MMULT(--REGEXMATCH(A:A,
"(?i)"&TRANSPOSE(C1:C3)),ROW(INDIRECT("a1:a"&COUNTA(C1:C3)))^0)=COUNTA(C1:C3))
It uses RegexMatch
of transposed list of words C1:C3
, and then mmult
function sums matches and =COUNTA(C1:C3)
compares the number of matches with the number of words in list.
The result is:
Bob ate the dead cat
The cat ate live bob
ate Cat bob
DeeKay789
Updated on June 28, 2022Comments
-
DeeKay789 almost 2 years
I'm using ArrayFormula and FILTER combination to list all cells in a column that contain all of the search term words. I'm using REGEXMATCH rather than QUERY/CONTAINS/LIKE because my FILTER has other criteria that return TRUE/FALSE.
My problem seems to be precedence. So the following regex works in a limited way.
=ArrayFormula(filter(A1:A5,regexmatch(A1:A5,"(?i)^"&"(.*?\bbob\b)(.*?\bcat\b)"&".*$")))
It will find Bob and cat but only if Bob precedes cat.
Google sheets fails if I try to use lookahead ?= ie
=ArrayFormula(filter(A1:A5,regexmatch(A1:A5,"(?i)^"&"(?=.*?\bbob\b)(?=.*?\bcat\b)"&".*$")))
I don't want to use the '|' alternation in the string (repeat and reverse) as the input words may be many more than two so alternation becomes exponentially more complex.
Here's the test search array (each row is a single cell containing a string)...
Bob ate the dead cat The cat ate live bob No cat ate live dog Bob is dead Bob and the cat are alive
... and the desired results I'm after.
Bob ate the dead cat The cat ate live bob Bob and the cat are alive
Once I have the regex sorted out, the final solution will be a user input text box where they simply enter the words that must be found in a string ie 'Bob cat'. This input string I think I can unpack into its separate words and concatenate to the above expression, however, if there's a 'best practice' way of doing this I'd like to hear.