How do I write a regular expression that excludes rather than matches, e.g., not (this|string)?

32,129

Solution 1

First of all: [^n][^o][^t] is not a solution. This would also exclude words like nil ([^n] does not match), bob ([^o] does not match) or cat ([^t] does not match).

But it is possible to build a regular expression with basic syntax that does match strings that neither contain not nor this:

^([^nt]|n($|[^o]|o($|[^t]))|t($|[^h]|h($|[^i]|i($|[^s]))))*$

The pattern of this regular expression is to allow any character that is not the first character of the words or only prefixes of the words but not the whole words.

Solution 2

This is not easily possible. Regular expressions are designed to match things, and this is all they can do.

First off: [^] does not designate an "excludes group", it designates a negated character class. Character classes do not support grouping in any form or shape. They support single characters (and, for convenience, character ranges). Your try [^(not|this)] is 100% equivalent to [^)(|hinots], as far as the regex engine is concerned.

Three ways can lead out of this situation:

  1. match (not|this) and exclude any matches with the help of the environment you are in (negate match results)
  2. use negative look-ahead, if supported by your regex engine and feasible in the situation
  3. rewrite the expression so it can match: see a similar question I asked earlier

Solution 3

Hard to believe that the accepted answer (from Gumbo) was actually accepted! Unless it was accepted because it indicated that you cannot do what you want. Unless you have a function that generates such regexps (as Gumbo shows), composing them would be a real pain.

What is the real use case -- what are you really trying to do?

As Tomalak indicated, (a) this is not what regexps do; (b) see the other post he linked to, for a good explanation, including what to do about your problem.

The answer is to use a regexp to match what you do not want, and then subtract that from the initial domain. IOW, do not try to make the regexp do the excluding (it cannot); do the excluding after using a regexp to match what you want to exclude.

This is how every tool that uses regexps works (e.g., grep): they offer a separate option (e.g. via syntax) that carries out the subtraction -- after matching what needs to be subtracted.

Solution 4

It sounds like you are trying to do negative lookahead. i.e. you are trying to stop matching once you reach some delimiter.

Emacs doesn't support lookahead directly, but it does support the non-greedy version of the *, +, and ? operators (*?, +?, ??), which can be used for the same purpose in most cases.

So for instance, to match the body of this javascript function:

bar = function (args) {
    if (blah) {
        foo();
    }
};

You can use this emacs regex:

function ([^)]+) {[[:ascii:]]+?};

Here we're stopping once we find the two element sequence "};". [[:ascii:]] is used instad of the "." operator because it works over multiple lines.

This is a little different than negative lookahead because the }; sequence itself it matched, however if your goal is to extract everything up until that point, you just use a capturing group \( and \).

See the emacs regex manual: http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexps.html

As a side note, if you writing any kind of emacs regex, be sure to invoke M-x re-builder, which will bring up a little IDE for writing your regex against the current buffer.

Solution 5

Try M-x flush-lines.

Share:
32,129
Anycorn
Author by

Anycorn

Andrey Asadchev in real life. Currently working in Silicon Valley. Previously was a PostDoc at VT (MPQC project), before that PhD at Iowa State. My interests are FISHING!!!, C++, Python, numerical algorithms, parallel and scientific programming, code optimization.

Updated on April 07, 2021

Comments

  • Anycorn
    Anycorn about 3 years

    I am stumped trying to create an Emacs regular-expression that excludes groups. [^] excludes individual characters in a set, but I want to exclude specific sequences of characters: something like [^(not|this)], so that strings containing "not" or "this" are not matched.

    In principle, I could write ([^n][^o][^t]|[^...]), but is there another way that's cleaner?