How to count the occurrences of a specific string in a text file using C

16,588

Solution 1

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int wc(char* file_path, char* word){
    FILE *fp;
    int count = 0;
    int ch, len;

    if(NULL==(fp=fopen(file_path, "r")))
        return -1;
    len = strlen(word);
    for(;;){
        int i;
        if(EOF==(ch=fgetc(fp))) break;
        if((char)ch != *word) continue;
        for(i=1;i<len;++i){
            if(EOF==(ch = fgetc(fp))) goto end;
            if((char)ch != word[i]){
                fseek(fp, 1-i, SEEK_CUR);
                goto next;
            }
        }
        ++count;
        next: ;
    }
end:
    fclose(fp);
    return count;
}

int main(){//testestest : count 2
    char key[] = "test"; // the string I am searching for
    int wordcount = 0;

    wordcount = wc("input.txt", key);
    printf("%d",wordcount);
    return 0;
}

Solution 2

strstr is defined in the string.h header. If you don't include string.h, strstr is undeclared in your source file and it winds up implicitly declared to return an int and take unspecified arguments (that is, it's as if it were declared int strstr()). This can be problematic when the object file for your program is linked to the standard C library due to potential function signature mismatches, hence the warning.

The solution is simple: make sure you include string.h.

As for the problem of multiple occurrences of a search string in a line, note the first paragraph in the description section of the strstr man page:

The strstr() function finds the first occurrence of the substring needle in the string haystack. The terminating null bytes ("\0") are not compared.

While you can use strstr to find multiple substrings, you'd need to loop over the string, using a different starting location each time. Depending on where you start, it could match previously matched portions of the string (e.g. "testest" would count as 2 matches) or only against unmatched portions (e.g. "testest" would count as 1).

If you wish to count the occurrences of a complete word and not just a substring, strstr isn't very useful. One option is to use strpbrk or strcspn to find word (i.e. alphabetic) characters and strspn to find non-word characters. With these, you can find the first character of a word, compare to the search string and, if it matches, test that the next character isn't alphabetic. If it isn't, increment the count; if it is, go to the next word. Alternatively, you can loop over each character and use isalpha to distinguish letters from non-letters (hence, beginnings and endings of words).

Another option is to split the input into a list of words, then scan the word list for your search word. String tokenizing functions will do this, though they alter the buffer you pass in. You can also use fscanf to read a word at a time from the file. This has the added advantage of correctly handling long lines.

Share:
16,588
sheebs
Author by

sheebs

Updated on June 13, 2022

Comments

  • sheebs
    sheebs almost 2 years

    I am trying to figure out how to count the number times a specific string "test" occurs in a text file using C programming. I want the the program to display the final count upon completion.

    This is the code I have come up with, but it doesn't seem to do to the trick. The count I generate is slightly lower than what is actually present in the text file.

    Does anyone see what I'm doing wrong? I'm fairly new to C programming, so any insight would be greatly appreciated!

    #include<stdio.h>
    #include<string.h>
    
    int main()
    {
        FILE *ptr_file;
        char buf[200];
        char key[] = "test"; // the string I am searching for
        int wordcount = 0;
    
        ptr_file = fopen("input.txt","r"); // my input text file
    
        while (fgets(buf,200, ptr_file)!=NULL)
        {
            if((strstr(buf,key)) !=NULL){
                wordcount++;
            }
        }
        fclose(ptr_file);
        printf("%d",wordcount);
    }