How to make custom dictionary for Hunspell

18,820

Solution 1

create your own word-list and affix file for your language, if that doesn't exist. Well, for papiamentu - Curaçao's native language - such dictionary doesn't exist. But I had a hard time finding out how to create such files, so I am documenting it here: http://www.suares.com/index.php?page_id=25&news_id=233

Solution 2

I'm trying to do the same but haven't found enough information to begin yet.

However, you may want to look at hunspell - format of Hunspell dictionaries and affix files .

UPDATE

If you are working with .NET, you can download Hunspell .NET port. Using it is fairly easy too.

var bee = new Hunspell();
bee.Load("path_to_en_US.aff");
bee.Load("path_to_en_US.dic");
bee.Add("my_custom_word1");
bee.Add("my_custom_word2");
var suggestions = bee.Suggest("misspel_word");

Solution 3

The secret to getting hunspell to work (at least for me) was to figure out the locations it would search that were owned by me, and put the custom dictionaries there. Also bear in mind that the dictionaries are in a specific format, so you need to obey those rules.

Running hunspell -D will show you the search path. On MacOS, mine includes /Users/scott/Library/Spelling so I created that directory and put mine there. Let's say you want to call your dictionary mydict and your input datafile of words is called dict.txt. We'll use the path I just showed.

First, copy the default .aff file. You will see it when you run hunspell -D as described above. For me, it's in /Library/Spelling/en_US/. So

cp /Library/Spelling/en_US.aff /Users/scott/Library/Spelling/mydict.aff

Then, every time you update your input list (dict.txt), do this:

DICT=/Users/scott/Library/Spelling/mydict.dic cd ~/doc/dict cat dict.txt | sort | uniq > dict.in wc -l dict.in > $DICT cat dict.in >> $DICT rm dict.in

To run hunspell, just specify both dictionaries. So for me, because I want a list of misspellings, I use

hunspell -l -d scott,en_US <filename>

Solution 4

I am implementing this type of feature as well. Once you've created the Hunspell object with an associated dictionary you can add individual words to it.

Keep in mind though that these words will only be available for as long as the Hunspell object is alive. Every time you access a new object you will have to add all the user defined words again.

Share:
18,820

Related videos on Youtube

Amin Y
Author by

Amin Y

Amin Yazdani is a software architect and the director of technology of A.Y. Technologies Inc. Amin has more than 15 years of experience with software development and 5 years of experience with software architecture and design. He has helped many startups with scalability of their software systems and have implemented agile processes to improve software development efficiency. He is an advocate of new wave of software development and operation management (devops) and has been an volunteer and organizer of DevOpsDays Vancouver for the past 3 years. He has a B.Sc. degree in Software Engineering from Sharif University of Technology and M.Sc. degree in Computer Science from Simon Fraser University.

Updated on June 07, 2022

Comments

  • Amin Y
    Amin Y almost 2 years

    I have a question about building a custom dictionary for hunspell. I'm using a general English dictionary and affix file right now. How can I add user-specified words to that dictionary for each of my users?

    • Karan Desai
      Karan Desai almost 7 years
      Just for reference for those who are looking for a start: github.com/karandesai28/…
    • jww
      jww about 6 years
      Switch to Aspell. It looks a lot better documented. After the poor selection of answers to your question and almost nothing on the web I am switching...
  • Andrés Chandía
    Andrés Chandía about 10 years
    Hey cara @waldir A great job you're doing, can you please explain in more detaill the "frequency list of characters", what is the input file and what the output one, I mean is "words" corresponding to the words list file and where should I put the results, under what name, this part is not clear, what is better the first method or the second?
  • waldyrious
    waldyrious about 10 years
    @AndrésChandía I didn't write this answer, I just edited it to fix the markdown. You should contact the original writer of this answer instead (user1250098). Try here: suares.com/index.php?topic=contact
  • MonsterMMORPG
    MonsterMMORPG about 7 years
    can we process dictionary files somehow? i mean arabic is too complex for me to solve but i need to get all words and related words of the dic
  • burns
    burns about 5 years
    You can use the -p option and you only need the list of sorted words. cat dict.txt | sort -u > custom_words. Then hunspell -l -p custom_words and it will use the default dictionary, but also include the custom_words from your file. No need to copy the .aff file.
  • Pryftan
    Pryftan almost 5 years
    I didn't down vote you but I do want to point out that when a user (and this is coming as a programmer - and programmers also happen to be users though many programmers ignore this in their insolence but never mind that) ask for help sending them to the documentation is not what they're after. I assure you of that. Users don't care how something works as long as it works. Rather than point to them what they probably already saw give them an example i.e. something to work on. That's what they're after. Yes documentation is often ignored but that's not the point here. Not at all.