How can I find the number of unique characters in a string?

44,914

Solution 1

This method has O(n^2) complexity, but it's very possible (though a bit more complex) to do this in O(n).

int CountUniqueCharacters(char* str){
    int count = 0;

    for (int i = 0; i < strlen(str); i++){
         bool appears = false;
         for (int j = 0; j < i; j++){
              if (str[j] == str[i]){
                  appears = true;
                  break;
              }
         }

         if (!appears){
             count++;
         }
    }

    return count;
}

The method iterates over all the characters in the string - for each character, it checks if the character appeared in any of the previous characters. If it didn't, then the character is unique, and the count is incremented.

Solution 2

Here's a simple C++ solution. This method has O(n) complexity:

int countDistinct(string s) 
{ 

    unordered_map<char, int> m; 
  
    for (int i = 0; i < s.length(); i++) { 
        m[s[i]]++; 
    } 
  
    return m.size(); 
} 

Solution 3

I find the following way of counting distinct characters, very simple and in O(n). Here the logic is, just traverse through the character array, and for each character make its count 1, even if it repeats, just override the value with 1 only. After you are done with traversing, just sum all the character occurance.

int count_distinc_char(const char *a){
     int c_arr[MAX_CHAR] = {0};
     int i, count = 0;
     for( i = 0; a[i] != '\0'; i++){
         c_arr[a[i] - 'a'] = 1;
     }    
     for( i = 0; i < MAX_CHAR; i++){
         count += c_arr[i];
     }
     return count;
}

Solution 4

well you can use a HashSet or unordered_set for the purpose but it has a worst case time complexity of O(N). Hence, its best to use an array of 256 memory locations or arr[256]. This gives the desired output in O(256)~ O(1) time

Solution 5

Create a linked list to store the characters found in the string and its occurences with the node structure as follow,

struct tagCharOccurence 
{
    char ch;
    unsigned int iCount;
};

Now read all the characters in a string one by one and as you read one character check if it is present in your linked list, if yes then increase its count and if character is not found in linked list then insert a new node with 'ch' set to read character and count initialized to one.

In this way you'll get the count of occurences of each character in single pass only. You can now use the linked list to print the characters as many times as its has been encountered.

Share:
44,914
Edenia
Author by

Edenia

I was a self-taught programmer, then I finished programming courses in a CPE 3rd degree "Programmer" with specialty in Software. It was so easy, with already being self-taught I basically just went in, took the diploma and ran to McDonald's for an apple pie. I started with making games in a highly performance-sensitive environment, then was the diploma then I became 14 and started real projects. I worked for Browserling at some point, because it was awesome.

Updated on September 29, 2021

Comments

  • Edenia
    Edenia over 2 years

    I have found nothing particular for this purpose.

    I am trying to figure out a function that counts each of the characters' occurrences in a string, so that I can pull them out at the end from the length to find how many homogeneous characters are used in that string.

    I've tried with nested loop, the first to apply and the second to scan the string and conditionally fulfill the character if it does not appear elsewhere in the string:

    size_t CountUniqueCharacters(char *str)
    {
        int i,j;
        char unique[CHAR_MAX];
        for(i=strlen(str); i>=0; i--)
        {
            for(j=strlen(str); j>=0; j--)
            {
                if(str[i] != unique[j])
                    unique[j] = str[i];
            }
        }
        return strlen(unique);
    }
    

    This didn't work well.

    This is useful if you are willing to limit someone to type lazy names such as "aaaaaaaaaaaaa".

  • Edenia
    Edenia almost 10 years
    I did exactly the same thing, but with one logical mistake.. didn't put the condition inside the first loop after the second one.
  • Edenia
    Edenia over 5 years
    Thanks for your follow-up. This seems like a variant of a lookup table. Their downsides are usually the huge chunk of memory being used, but are quite fast and easy to implement.
  • Amir Fo
    Amir Fo almost 3 years
    You should analyze the complexity of unordered_map mechanism, so that the overall complexity does not lead to O(n), I think.
  • Nishil Shah
    Nishil Shah almost 3 years
    unordered_map has a complexity of O(n) in my best knowledge
  • Hisham Hijjawi
    Hisham Hijjawi over 2 years
    This answer is unnecessarily long, you don't need to keep track of frequencies and therefore can use a unordered_set. See my answer below that is shorter with the same time complexity.
  • gov
    gov over 2 years
    did you meant int numUniqeChars = std::size(std::unordered_set<char>(std::begin(str), std::end(str)));
  • Hisham Hijjawi
    Hisham Hijjawi over 2 years
    @gov yeah, thanks for catching that, fixed now.
  • gov
    gov over 2 years
    as std::size is not supported in c++11, you can use std::unordered_set<char> uset(std::begin(str), std::end(str)); int numUniqeChars = uset.size();
  • Hisham Hijjawi
    Hisham Hijjawi over 2 years
    @gov good callout, I changed the answer to make it work on older versions of C++.