Counting characters in golang string

18,915

Solution 1

I wrote a package that allows you to do this: https://github.com/rivo/uniseg. It breaks strings according to the rules specified in Unicode Standard Annex #29 which is what you are looking for. Here is how you would use it in your case:

package main

import (
    "fmt"

    "github.com/rivo/uniseg"
)

func main() {
    fmt.Println(uniseg.GraphemeClusterCount("Hello, δΈ–πŸ––πŸΏπŸ––η•Œ"))
}

This will print 11 as you expect.

Solution 2

Have you tried strings.Count?

package main

import (
     "fmt"
     "strings"
 )

 func main() {
     fmt.Println(strings.Count("Hello, δΈ–πŸ––πŸ––η•Œ", "πŸ––")) // Returns 2
 }

Solution 3

Straight forward natively use the utf8.RuneCountInString()

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    str := "Hello, δΈ–πŸ––πŸ––η•Œ"
    fmt.Println("counts =", utf8.RuneCountInString(str))
}

Solution 4

Reference to the example of API document. https://golang.org/pkg/unicode/utf8/#example_DecodeLastRuneInString

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    str := "Hello, δΈ–πŸ––η•Œ"
    count := 0
    for len(str) > 0 {
        r, size := utf8.DecodeLastRuneInString(str)
        count++
        fmt.Printf("%c %v\n", r, size)

        str = str[:len(str)-size]
    }
    fmt.Println("count:",count)
}
Share:
18,915
Bjorn Roche
Author by

Bjorn Roche

I write audio and video software. I like to work with C and related languages (C++, Go, Java, Objective-C, Swift, etc) but whatever works. Currently I'm working on Shimmeo, a music video app: www.shimmeo.com I can be reached at bjornroche.com

Updated on June 15, 2022

Comments

  • Bjorn Roche
    Bjorn Roche almost 2 years

    I am trying to count "characters" in go. That is, if a string contains one printable "glyph", or "composed character" (or what someone would ordinarily think of as a character), I want it to count 1. For example, the string "Hello, δΈ–πŸ––πŸΏπŸ––η•Œ", should count 11, since there are 11 characters, and a human would look at this and say there are 11 glyphs.

    utf8.RuneCountInString() works well in most cases, including ascii, accents, asian characters and even emojis. However, as I understand it runes correspond to code points, not characters. When I try to use basic emojis it works, but when I use emojis that have different skin tones, I get the wrong count: https://play.golang.org/p/aFIGsB6MsO

    From what I read here and here the following should work, but I still don't seem to be getting the right results (it over-counts):

    func CountCharactersInString(str string) int {
        var ia norm.Iter
        ia.InitString(norm.NFC, str)
        nc := 0
        for !ia.Done() {
            nc = nc + 1
            ia.Next()
        }
        return nc
    }
    

    This doesn't work either:

    func GraphemeCountInString(str string) int {
        re := regexp.MustCompile("\\PM\\pM*|.")
        return len(re.FindAllString(str, -1))
    }
    

    I am looking for something similar to this in Objective C:

    + (NSInteger)countCharactersInString:(NSString *) string {
        // --- Calculate the number of characters enterd by user and update character count label
        NSInteger count = 0;
        NSUInteger index = 0;
        while (index < string.length) {
            NSRange range = [string rangeOfComposedCharacterSequenceAtIndex:index];
            count++;
            index += range.length;
        }
        return count;
     }