Counting characters in golang string
Solution 1
I wrote a package that allows you to do this: https://github.com/rivo/uniseg. It breaks strings according to the rules specified in Unicode Standard Annex #29 which is what you are looking for. Here is how you would use it in your case:
package main
import (
"fmt"
"github.com/rivo/uniseg"
)
func main() {
fmt.Println(uniseg.GraphemeClusterCount("Hello, δΈππΏπη"))
}
This will print 11
as you expect.
Solution 2
Have you tried strings.Count?
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.Count("Hello, δΈππη", "π")) // Returns 2
}
Solution 3
Straight forward natively use the utf8.RuneCountInString()
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
str := "Hello, δΈππη"
fmt.Println("counts =", utf8.RuneCountInString(str))
}
Solution 4
Reference to the example of API document. https://golang.org/pkg/unicode/utf8/#example_DecodeLastRuneInString
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
str := "Hello, δΈπη"
count := 0
for len(str) > 0 {
r, size := utf8.DecodeLastRuneInString(str)
count++
fmt.Printf("%c %v\n", r, size)
str = str[:len(str)-size]
}
fmt.Println("count:",count)
}
Bjorn Roche
I write audio and video software. I like to work with C and related languages (C++, Go, Java, Objective-C, Swift, etc) but whatever works. Currently I'm working on Shimmeo, a music video app: www.shimmeo.com I can be reached at bjornroche.com
Updated on June 15, 2022Comments
-
Bjorn Roche almost 2 years
I am trying to count "characters" in go. That is, if a string contains one printable "glyph", or "composed character" (or what someone would ordinarily think of as a character), I want it to count 1. For example, the string "Hello, δΈππΏπη", should count 11, since there are 11 characters, and a human would look at this and say there are 11 glyphs.
utf8.RuneCountInString() works well in most cases, including ascii, accents, asian characters and even emojis. However, as I understand it runes correspond to code points, not characters. When I try to use basic emojis it works, but when I use emojis that have different skin tones, I get the wrong count: https://play.golang.org/p/aFIGsB6MsO
From what I read here and here the following should work, but I still don't seem to be getting the right results (it over-counts):
func CountCharactersInString(str string) int { var ia norm.Iter ia.InitString(norm.NFC, str) nc := 0 for !ia.Done() { nc = nc + 1 ia.Next() } return nc }
This doesn't work either:
func GraphemeCountInString(str string) int { re := regexp.MustCompile("\\PM\\pM*|.") return len(re.FindAllString(str, -1)) }
I am looking for something similar to this in Objective C:
+ (NSInteger)countCharactersInString:(NSString *) string { // --- Calculate the number of characters enterd by user and update character count label NSInteger count = 0; NSUInteger index = 0; while (index < string.length) { NSRange range = [string rangeOfComposedCharacterSequenceAtIndex:index]; count++; index += range.length; } return count; }