Java8: Create HashMap with character count of a String

10,786

Solution 1

Simplest way to count occurrence of each character in a string, with full Unicode support (Java 11+)1:

String word = "AAABBB";
Map<String, Long> charCount = word.codePoints().mapToObj(Character::toString)
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
System.out.println(charCount);

1) Java 8 version with full Unicode support is at the end of the answer.

Output

{A=3, B=3}

UPDATE: For Java 8+ (doesn't support characters from supplemental planes, e.g. emoji):

Map<String, Long> charCount = IntStream.range(0, word.length())
        .mapToObj(i -> word.substring(i, i + 1))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

UPDATE 2: Also for Java 8+.

I was mistaken, thinking that codePoints() wasn't added until Java 9. It was added in Java 8 to the CharSequence interface, so it doesn't show in javadoc for String in Java 8, and shows as added in Java 9 for later versions of the javadoc.

However, the Character.toString​(int codePoint) method wasn't added until Java 11, so to use the Character.toString​(char c) method, we can use chars() in Java 8:

Map<String, Long> charCount = word.chars().mapToObj(c -> Character.toString((char) c))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

Or for full Unicode support, incl. supplemental planes, we can use codePoints() and the String(int[] codePoints, int offset, int count) constructor, in Java 8:

Map<String, Long> charCount = word.codePoints()
        .mapToObj(cp -> new String(new int[] { cp }, 0, 1))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

Solution 2

     String str = "Hello Manash";
    Map<Character,Long> hm = str.chars().mapToObj(c-> 
    (char)c).collect(Collectors.groupingBy(c->c,Collectors.counting()));
    System.out.println(hm);

Solution 3

Try the below approaches:

Approach 1:

    String str = "abcaadcbcb";
    
    Map<Character, Integer> charCount = str.chars()
            .boxed()
            .collect(toMap(
                    k -> (char) k.intValue(),
                    v -> 1,         // 1 occurence
                    Integer::sum));
    System.out.println("Char Counts:\n" + charCount);

Approach 2:

    String str = "abcaadcbcb";
    Map<Character, Integer> charCount = new HashMap<>();
    for (char c : str.toCharArray()) {
        charCount.merge(c,          // key = char
                1,                  // value to merge
                Integer::sum);      // counting
    }
    System.out.println("Char Counts:\n" + charCount);

Output:

    Char Counts:
    {a=3, b=3, c=3, d=1}

Solution 4

String str = "abcaadcbcb";

Map<String, Long> charCount  = 
Arrays.asList(str.split("")).stream().collect(Collectors.groupingBy(Function.identity(),Collectors.counting()));
    
Share:
10,786
OTUser
Author by

OTUser

Updated on July 19, 2022

Comments

  • OTUser
    OTUser almost 2 years

    Wondering is there more simple way than computing the character count of a given string as below?

    String word = "AAABBB";
        Map<String, Integer> charCount = new HashMap();
        for(String charr: word.split("")){
            Integer added = charCount.putIfAbsent(charr, 1);
            if(added != null)
                charCount.computeIfPresent(charr,(k,v) -> v+1);
        }
    
        System.out.println(charCount);
    
    • nice_dev
      nice_dev about 5 years
      For ANSI characters, you can just have an array of size 256 and compute it.
    • Andreas
      Andreas about 5 years
      @vivek_23 Which ANSI character set would that be? Or did you mean ASCII and 128?
    • Holger
      Holger almost 4 years
      @vivek_23 that is the windows code page 1252, not ANSI. The Unicode standard matches the iso-latin-1 character set for the first 256 codepoints. Referring to the windows code page 1252 is an unnecessary complication, as that code page does not match in the 128-159 range.
    • nice_dev
      nice_dev almost 4 years
      @Holger Ahh! Thanks for the correction. Deleted my previous comment to avoid confusion.
  • OTUser
    OTUser about 5 years
    Am sorry, is there a simple way for Java 8?
  • Andreas
    Andreas about 5 years
    chars() requires Java 9, and better solution using codePoints() instead of chars() already posted 13 minutes earlier.
  • mm6
    mm6 about 5 years
    @Andreas agree withcodePoints()solution, butchars()introduce in java 8 String.chars()
  • Andreas
    Andreas about 5 years
    That would be CharSequence.chars(), not String.chars(), but I accept your correction. Javadoc for Java 11 show method as added to String in Java 9, which is what lead me astray.
  • Holger
    Holger almost 4 years
    charCount.put(charr,charCount.getOrDefault(charr,0)+1); can be simplified to charCount.merge(charr, 1, Integer::sum); By the way, you should use new HashMap<>()
  • Holger
    Holger almost 4 years
    Speaking of “full Unicode support” and Emojis, it’s worth pointing out that even using codepoints is not necessarily providing the intended semantics. E.g. "ā̧👩‍🇮🇩" has 10 chars, 7 codepoints, but only three characters; the first one demonstrates that this is not only an Emoji issue. The only solution, I currently know of, is to process grapheme clusters, e.g. with Java 9+: Pattern.compile("\\X").matcher(example).results() .collect(Collectors.groupingBy(MatchResult::group, Collectors.counting())).
  • user2901351
    user2901351 over 2 years
    MIght you format your code snippet as coded, to allow for greater readability? Thanks.
  • Admin
    Admin over 2 years
    As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
  • Admin
    Admin over 2 years
    Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.