How to exclude specific string using regex in Python?

20,469

Solution 1

You could try this regex to match all the lines which doesn't have the string you with ? at the last,

^(?!.*you).*\?$

Explanation:

A negative lookahead is used in this regex. What it does actually means, it checks for the lines which contains a string you. It matches all the lines except the line containing the string you.

DEMO

Solution 2

There's a neat trick to exclude some matches from a regex, which you can use here:

>>> import re
>>> corpus = """
... 45 meters?
... 45?
... 45 ?
... 45 meters you?
... 45 you  ?
... 45, and you?
... """
>>> pattern = re.compile(r"\d+[^?]*you|(\d+[^?]*\?)")
>>> re.findall(pattern, corpus)
['45 meters?', '45?', '45 ?', '', '', '']

The downside is that you get empty matches when the exclusion kicks in, but those are easily filtered out:

>>> filter(None, re.findall(pattern, corpus))
['45 meters?', '45?', '45 ?']

How it works:

The trick is that we only pay attention to captured groups ... so the left hand side of the alternation - \d+[^?]*you (or "digits followed by non-?-characters followed by 'you'") matches what you don't want, and then we forget about it. Only if the left hand side doesn't match is the right hand side - (\d+[^?]*\?) (or "digits followed by non-?-characters followed by '?') - matched, and that one is captured.

Share:
20,469
f_ficarola
Author by

f_ficarola

Computer Engineer

Updated on July 05, 2022

Comments

  • f_ficarola
    f_ficarola almost 2 years

    I'd like to match strings like:

    45 meters?
    45, meters?
    45?
    45 ?
    

    but not strings like:

    45 meters you?
    45 you  ?
    45, and you?
    

    In both cases the question mark must be at the end. So, essentially I want to exclude all those strings containing the word "you".

    I've tried the following regex:

    '\d+.*(?!you)\?$'
    

    but it matches the second case (probably because of .*)