To count the frequency of each word
Solution 1
Here is a solution that should count all the word frequencies in a file:
private void countWordsInFile(string file, Dictionary<string, int> words)
{
var content = File.ReadAllText(file);
var wordPattern = new Regex(@"\w+");
foreach (Match match in wordPattern.Matches(content))
{
int currentCount=0;
words.TryGetValue(match.Value, out currentCount);
currentCount++;
words[match.Value] = currentCount;
}
}
You can call this code like this:
var words = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);
countWordsInFile("file1.txt", words);
After this words will contain all words in the file with their frequency (e.g. words["test"]
returns the number of times that "test" is in the file content. If you need to accumulate the results from more than one file, simply call the method for all files with the same dictionary. If you need separate results for each file then create a new dictionary each time and use a structure like @DarkGray suggested.
Solution 2
There is a Linq-ish alternative which imo is simpler. The key here is to use the framework built in File.ReadLines
(which is lazily read which is cool) and string.Split
.
private Dictionary<string, int> GetWordFrequency(string file)
{
return File.ReadLines(file)
.SelectMany(x => x.Split())
.Where(x => x != string.Empty)
.GroupBy(x => x)
.ToDictionary(x => x.Key, x => x.Count());
}
To get frequencies from many files, you can have an overload based on params
.
private Dictionary<string, int> GetWordFrequency(params string[] files)
{
return files.SelectMany(x => File.ReadLines(x))
.SelectMany(x => x.Split())
.Where(x => x != string.Empty)
.GroupBy(x => x)
.ToDictionary(x => x.Key, x => x.Count());
}
Admin
Updated on July 02, 2022Comments
-
Admin almost 2 years
There's a directory with a few text files. How do I count the frequency of each word in each file? A word means a set of characters that can contain the letters, the digits and the underlining characters.
-
Admin about 12 yearsDoes this regex allow a set of characters that can contain the letters, the digits and the underlining characters only? And which generic container should I use to store information about the words, the count frequencies and the files?
-
Serj-Tm about 12 years@Grienders Check current variant
-
Admin about 12 yearswhat does your code do? It does not do what I need! Does it count the frequency of each word or does it count the amount of all the words?
-
Mayank Singh almost 5 yearsKeep sending filename to this piece of code to find the frequency for each file.