Golang multiline regex not working

20,236

Solution 1

By default, "." doesn't match newline. If you give the "s" flag, it does. I don't think you need "m".

Note that if there are multiple <think>...</think> in your string, the regexp will match everything between the first <think> and the last </think>. Using .*? will cause it to only match the contents of the first one.

Solution 2

Do not use regexp to parse XML, instead use encoding/xml. Example of a corner-case which is impossible to handle in regexp: <think><elem attrib="I'm pondering about </think> tag now"></elem></think>

I'll use START and STOP as markers, just to disassociate from any XML stuff. Complete example (includes both LF and CRLF line endings, just in case) with a link to The Go Playground:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    r := regexp.MustCompile(`(?s)START(.*?)STOP`)
    const s = "That is \nSTART\nFOOBAR\r\n\r\nSTOP\n"
    fmt.Printf("%#v\n", r.FindStringSubmatch(s))
}

returns:

[]string{"START\nFOOBAR\r\n\r\nSTOP", "\nFOOBAR\r\n\r\n"}
Share:
20,236
Eduardo Pereira
Author by

Eduardo Pereira

Updated on July 29, 2022

Comments

  • Eduardo Pereira
    Eduardo Pereira almost 2 years

    Why the the following multiline regex do not work, I expect to match the substring inside the tags. Other simples multiline matches worked correctly.

    func main() {
        r := regexp.MustCompile(`(?m)<think>(.*)</think>`)
        const s = `That is 
        <think>
        FOOBAR
        </think>`
        fmt.Printf("%#v\n", r.FindStringSubmatch(s))
    }
    

    https://play.golang.org/p/8C6u_0ca8w

    • heemayl
      heemayl about 8 years
      Try (?m)<think>([^<]+)</think> or if non-greediness is supported (?m)<think>(.*?)</think>
  • Endophage
    Endophage about 8 years
    I thought the same thing about . but in golang s is set by default: github.com/google/re2/wiki/Syntax Although this does seem to fix it so I guess the docs are wrong...
  • Eduardo Pereira
    Eduardo Pereira about 8 years
    If s=true, then the new line will match, the default is false. Thanks for the clarification.
  • shaktisinghr
    shaktisinghr almost 3 years
    As previous answers have mentioned and assuming OP wants to match matching XML tags - if there's another </think> tag then a greedy match will include everything in between. Use a non-greedy pattern (.*?) with FindAllStringSubmatch to get all matches. Edit: Playground link
  • kubanczyk
    kubanczyk almost 3 years
    @shaktisinghr Yeah, no. I've updated my answer to disassociate it completely from the idea of parsing XML using regexp. Now it's just about a generic non-greedy multiline regexp. Thanks for your comment.