Is there a way to match everything except a constant string using Go.Regexp?

12,172

Solution 1

Update

Go regexp module does not support lookaheads because this package guarantees to run in O(n) time, and the authors did not find a way to introduce lookarounds without violating these constraints.

However, you may use different workarounds. For the current one, you can use the http://www.formauri.es/personal/pgimeno/misc/non-match-regex Web service that generates POSIX-compatible negated patterns. E.g. for somestring, it generates a ^([^s]|s(s|o(s|m(s|es(omes)*(s|t(s|r(s|i(s|ns)))|o(s|ms)))))*([^os]|o([^ms]|m([^es]|e([^s]|s(omes)*([^ost]|t([^rs]|r([^is]|i([^ns]|n[^gs])))|o([^ms]|m([^es]|e[^s]))))))))*(s(s|o(s|m(s|es(omes)*(s|t(s|r(s|i(s|ns)))|o(s|ms)))))*(o((me?)?|mes(omes)*(t(r?|rin?)|o(me?)?)?))?)?$ regex, and in order to use it in your original regex, all you need is to replace the last (.*) with (<part after ^>), i.e. the regex will look like

/[^/]*/[^/]*/(([^s]|s(s|o(s|m(s|es(omes)*(s|t(s|r(s|i(s|ns)))|o(s|ms)))))*([^os]|o([^ms]|m([^es]|e([^s]|s(omes)*([^ost]|t([^rs]|r([^is]|i([^ns]|n[^gs])))|o([^ms]|m([^es]|e[^s]))))))))*(s(s|o(s|m(s|es(omes)*(s|t(s|r(s|i(s|ns)))|o(s|ms)))))*(o((me?)?|mes(omes)*(t(r?|rin?)|o(me?)?)?))?)?)$

See the regex demo.

To make sure the regex only captures the part after third backslash, the first two .* patterns are replaced with [^/]* that match zero or more chars other than /. (In the demo, I added \n, too, to avoid matching across lines in the single multiline string demo).

Originally accepted answer

The anything/anything/somestring should not be expressed as \/.*\/.*\/(.*). The first .* matches up to the last but one / in the string. You need to use a negated character class [^/] (not the / should not be escaped in Go regex).

Since RE2 that Go uses does not support lookaheads, you need to capture (as JimB mentions in the comments) all three parts you are interested in, and after checking the capture group #1 value, decide what to return:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    s := "anything/anything/somestring"
    r := regexp.MustCompile(`^[^/]+/[^/]+/(.*)`)
    val := r.FindStringSubmatch(s)
    // fmt.Println(val[1]) // -> somestring
    if len(val) > 1 && val[1] != "somestring" { // val has more than 1 element and is not equal to somestring?
        fmt.Println(val[1])      // Use val[1]
    } else {
        fmt.Println("No match")  // Else, report no match
    }
}

See the Go demo

Solution 2

Golang intentionally leaves this feature out as there is no way to implement it in O(n) time to satisfy the constraints of a true Regular Expression according to Russ Cox:

The lack of generalized assertions, like the lack of backreferences, is not a statement on our part about regular expression style. It is a consequence of not knowing how to implement them efficiently. If you can implement them while preserving the guarantees made by the current package regexp, namely that it makes a single scan over the input and runs in O(n) time, then I would be happy to review and approve that CL. However, I have pondered how to do this for five years, off and on, and gotten nowhere.

It looks like the best way to do this is to manually check the match after as JimB mentions above.

Share:
12,172
Ryan
Author by

Ryan

Student and aspiring Software Engineer. Looking to master the backend! SOreadytohelp

Updated on July 01, 2022

Comments

  • Ryan
    Ryan almost 2 years

    I have found many similar questions that do not work with the Go regex syntax.

    The string that I am attempting to match against is in the form of anything/anything/somestring. With the pattern \/.*\/.*\/(.*), I will match somestring, but I am trying to match anything except strings that contain somestring.

    Most answers propose using something like \/.*\/.*\/((?!somestring).*), however in golang regexp I get: ? The preceding token is not quantifiable.

    For clarification: /test/test/MATCH would produce a match while /test/test/somestring would not. Is this possible with the (limited) Go regex syntax?