Golang multiline regex not working
Solution 1
By default, "." doesn't match newline. If you give the "s" flag, it does. I don't think you need "m".
Note that if there are multiple <think>...</think>
in your string, the regexp will match everything between the first <think>
and the last </think>
. Using .*?
will cause it to only match the contents of the first one.
Solution 2
Do not use regexp to parse XML, instead use encoding/xml. Example of a corner-case which is impossible to handle in regexp: <think><elem attrib="I'm pondering about </think> tag now"></elem></think>
I'll use START
and STOP
as markers, just to disassociate from any XML stuff. Complete example (includes both LF and CRLF line endings, just in case) with a link to The Go Playground:
package main
import (
"fmt"
"regexp"
)
func main() {
r := regexp.MustCompile(`(?s)START(.*?)STOP`)
const s = "That is \nSTART\nFOOBAR\r\n\r\nSTOP\n"
fmt.Printf("%#v\n", r.FindStringSubmatch(s))
}
returns:
[]string{"START\nFOOBAR\r\n\r\nSTOP", "\nFOOBAR\r\n\r\n"}
Eduardo Pereira
Updated on July 29, 2022Comments
-
Eduardo Pereira almost 2 years
Why the the following multiline regex do not work, I expect to match the substring inside the tags. Other simples multiline matches worked correctly.
func main() { r := regexp.MustCompile(`(?m)<think>(.*)</think>`) const s = `That is <think> FOOBAR </think>` fmt.Printf("%#v\n", r.FindStringSubmatch(s)) }
-
heemayl about 8 yearsTry
(?m)<think>([^<]+)</think>
or if non-greediness is supported(?m)<think>(.*?)</think>
-
-
Endophage about 8 yearsI thought the same thing about
.
but in golangs
is set by default: github.com/google/re2/wiki/Syntax Although this does seem to fix it so I guess the docs are wrong... -
Eduardo Pereira about 8 yearsIf
s=true
, then the new line will match, the default isfalse
. Thanks for the clarification. -
shaktisinghr almost 3 yearsAs previous answers have mentioned and assuming OP wants to match matching XML tags - if there's another
</think>
tag then a greedy match will include everything in between. Use a non-greedy pattern(.*?)
with FindAllStringSubmatch to get all matches. Edit: Playground link -
kubanczyk almost 3 years@shaktisinghr Yeah, no. I've updated my answer to disassociate it completely from the idea of parsing XML using regexp. Now it's just about a generic non-greedy multiline regexp. Thanks for your comment.