How to get capturing group functionality in Go regular expressions
Solution 1
how should I re-write these expressions?
Add some Ps, as defined here:
(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})
Cross reference capture group names with re.SubexpNames()
.
And use as follows:
package main
import (
"fmt"
"regexp"
)
func main() {
r := regexp.MustCompile(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`)
fmt.Printf("%#v\n", r.FindStringSubmatch(`2015-05-27`))
fmt.Printf("%#v\n", r.SubexpNames())
}
Solution 2
I had created a function for handling url expressions but it suits your needs too. You can check this snippet but it simply works like this:
/**
* Parses url with the given regular expression and returns the
* group values defined in the expression.
*
*/
func getParams(regEx, url string) (paramsMap map[string]string) {
var compRegEx = regexp.MustCompile(regEx)
match := compRegEx.FindStringSubmatch(url)
paramsMap = make(map[string]string)
for i, name := range compRegEx.SubexpNames() {
if i > 0 && i <= len(match) {
paramsMap[name] = match[i]
}
}
return paramsMap
}
You can use this function like:
params := getParams(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`, `2015-05-27`)
fmt.Println(params)
and the output will be:
map[Year:2015 Month:05 Day:27]
Solution 3
To improve RAM and CPU usage without calling anonymous functions inside loop and without copying arrays in memory inside loop with "append" function see the next example:
You can store more than one subgroup with multiline text, without appending string with '+' and without using for loop inside for loop (like other examples posted here).
txt := `2001-01-20
2009-03-22
2018-02-25
2018-06-07`
regex := *regexp.MustCompile(`(?s)(\d{4})-(\d{2})-(\d{2})`)
res := regex.FindAllStringSubmatch(txt, -1)
for i := range res {
//like Java: match.group(1), match.gropu(2), etc
fmt.Printf("year: %s, month: %s, day: %s\n", res[i][1], res[i][2], res[i][3])
}
Output:
year: 2001, month: 01, day: 20
year: 2009, month: 03, day: 22
year: 2018, month: 02, day: 25
year: 2018, month: 06, day: 07
Note: res[i][0] =~ match.group(0) Java
If you want to store this information use a struct type:
type date struct {
y,m,d int
}
...
func main() {
...
dates := make([]date, 0, len(res))
for ... {
dates[index] = date{y: res[index][1], m: res[index][2], d: res[index][3]}
}
}
It's better to use anonymous groups (performance improvement)
Using "ReplaceAllGroupFunc" posted on Github is bad idea because:
- is using loop inside loop
- is using anonymous function call inside loop
- has a lot of code
- is using the "append" function inside loop and that's bad. Every time a call is made to "append" function, is copying the array to new memory position
Solution 4
As of GO 1.15, you can simplify the process by using Regexp.SubexpIndex
. You can check the release notes at https://golang.org/doc/go1.15#regexp.
Based in your example, you'd have something like the following:
re := regexp.MustCompile(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`)
matches := re.FindStringSubmatch("Some random date: 2001-01-20")
yearIndex := re.SubexpIndex("Year")
fmt.Println(matches[yearIndex])
You can check and execute this example at https://play.golang.org/p/ImJ7i_ZQ3Hu.
Solution 5
Simple way to determine group names based on @VasileM answer.
Disclaimer: it's not about memory/cpu/time optimization
package main
import (
"fmt"
"regexp"
)
func main() {
r := regexp.MustCompile(`^(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})$`)
res := r.FindStringSubmatch(`2015-05-27`)
names := r.SubexpNames()
for i, _ := range res {
if i != 0 {
fmt.Println(names[i], res[i])
}
}
}
Related videos on Youtube
Plastikfan
Updated on July 08, 2022Comments
-
Plastikfan almost 2 years
I'm porting a library from Ruby to Go, and have just discovered that regular expressions in Ruby are not compatible with Go (google RE2). It's come to my attention that Ruby & Java (plus other languages use PCRE regular expressions (perl compatible, which supports capturing groups)), so I need to re-write my expressions so that they compile ok in Go.
For example, I have the following regex:
`(?<Year>\d{4})-(?<Month>\d{2})-(?<Day>\d{2})`
This should accept input such as:
2001-01-20
The capturing groups allow the year, month and day to be captured into variables. To get the value of each group, it's very easy; you just index into the returned matched data with the group name and you get the value back. So, for example to get the year, something like this pseudo code:
m=expression.Match("2001-01-20") year = m["Year"]
This is a pattern I use a lot in my expressions, so I have a lot of re-writing to do.
So, is there a way to get this kind of functionality in Go regexp; how should I re-write these expressions?
-
Plastikfan almost 9 yearsOk great that looks encouraging, but how would I get access to the individual values, year, month and day?
-
Plastikfan almost 9 yearsForget that last comment, I just found that answer. Its all in the ?P, as you say :)
-
Kevin Burke about 8 yearsI'm still confused by this; I'm not sure they are addressable by Year, Month, etc. I get back an array with four values and can index into it, but that's it.
-
thwd about 8 years@KevinBurke see the example in
Regexp.SubexpNames
-
wvxvw almost 8 years@thwd Now, this is begging a question: what should happen if you named two groups in the same way? This is not a well-defined behavior, but the regex compiler doesn't complain about it. Your code throws away the first match for example, but I can imagine situations where it would make sense to throw all but first, or maybe collect all of them... Designing a language has lots of subtleties...
-
Vladimir Bauer over 7 years@wvxvw (?P<name>group) syntax was first introduced by Python re module. It is not Go specific syntax. Read more
-
wvxvw over 7 years@VladimirBauer I'm not sure of what you are getting at. I know it's not specific to Go, I'm arguing that specifically in Go, the built-in library implementation of this feature is bad because it duplicates another simpler feature of this library, but with an additional meaningless syntactical element.
-
VasileM over 4 yearsYes, there is a better and worse solution if you consider wasted clock cycles, wasted RAM, etc. With modesty you would let a farmer publish code in production.
-
Eric Lindsey over 3 years@wvxvw this is well-defined behavior now: golang.org/pkg/regexp/#Regexp.SubexpIndex but still without the additional possibilities you mentioned.