How to get english language word database?

178,727

Solution 1

WordNet database might be helpful. I once worked on a Firefox add-on which deals with words and all kinds of simple to complicated associations between them and stuff. Looks like WordNet will be very much useful to you.

Here it is in MySQL format. And this one (web-archived link) uses Wordnet v3.0 data, rather than the older Wordnet 2.0 data.

Solution 2

You can find what you need on infochimps.org.

They have a list of 350,000 simple (ie non-compound) words available for free download.

Word List - 350,000+ Simple English Words

Regarding other languages, you might want to poke around on Wiktionary. Here is a link to all the database backups - the information isnt organized so likely but if they have a language, you can download the data in SQL format.

Solution 3

I do not see http://wordlist.sourceforge.net/ mentioned here, but that is where I would start if I were looking for something like this (and I was, when I stumbled over this question).

If you cannot find what you want there, and what you want is a list of english words, then you should probably spend some extra time describing how to recognize what it is that you want.

Solution 4

There's no such thing as a "complete" list. Different people have different ways of measuring -- for example, they might include slang, neologisms, multi-word phrases, offensive terms, foreign words, verb conjugations, and so on. Some people have even counted a million words! So you'll have to decide what you want in a word list.

Solution 5

You may check *spell en-GB dictionary used by Mozilla, OpenOffice, plenty of other software.

Share:
178,727

Related videos on Youtube

Costique
Author by

Costique

Experienced Objective C developer

Updated on April 22, 2020

Comments

  • Costique
    Costique about 4 years

    I need a database of every single valid word in English. I checked the /usr/share/dict/words file, it contains less than 100k words. Wikipedia says English has 475k words. Where do I get the complete list (American spelling)?

    Also, is there a single website that gives out words for other languages too, including Asian and European ones?

    Edit: Forgot to add, I do not need names etc., just valid English words.

    • marshall.ward
      marshall.ward over 10 years
      My /usr/share/dict/words has 479829 words, so maybe there is some variation here (and might be suitable for others).
    • nelsonic
      nelsonic almost 10 years
      wc -l /usr/share/dict/words on Mac is 235,886 words (July 2014 - OSX Mavericks 10.9.4)
    • Cesar Bielich
      Cesar Bielich over 8 years
    • kofifus
      kofifus almost 8 years
      you can get a worlist here marcoagpinto.cidadevirtual.pt/proofingtoolgui.html .. look for the WORDLIST link on the right
    • Chris Rae
      Chris Rae over 5 years
      Just in case anyone is still looking for this, I just got a good free Scrabble dictionary from wordgamedictionary.com.
    • user2901351
      user2901351 about 3 years
      the resource @james.garriss posted (thx!) is no longer there. Looks like the repo lives tho: github.com/dwyl/english-words
  • Admin
    Admin about 14 years
    no, not for blacklist. I am doing some sort of word game/graph.
  • Admin
    Admin about 14 years
    do they have a downloadable list too?
  • user266803
    user266803 about 14 years
    Yes, they give you the facility to download their database in a lot of formats - CSV, MySQL Database, etc.. and even have APIs you can use through .Net, Java etc... This is the download page - wordnet.princeton.edu/wordnet/download
  • Admin
    Admin about 14 years
  • user266803
    user266803 about 14 years
    I have not personally downloaded it, but it was there ready when I started coding. So I don't know what files will be there in which download. I just know that you can download in different formats. If you can tell me in which format you want, I may be able to help.
  • jokoon
    jokoon over 12 years
    I installed wordnet, but can't find any command line, is it just a library ?
  • Chris Rae
    Chris Rae over 12 years
    The download link has changed - infochimps.com/datasets/…
  • user115422
    user115422 about 11 years
    i need a MySQL database that contains all the verbs in the english language...
  • Barış Akkurt
    Barış Akkurt almost 10 years
    Is there an sqlite version?
  • nelsonic
    nelsonic almost 10 years
    Annoyingly the infochimps file is .xls (an excel file with the words split across 6 worksheets!) ... I've extracted all 354986 words into a txt file: github.com/nelsonic/english-words
  • Admin
    Admin over 9 years
    @nelsonic thanks a lot ,the infochimps link is 404
  • Admin
    Admin over 9 years
    link on mozilla en-gb.pyxidium.co.uk/dictionary/en_GB.zip says Server not found, any update ? thanks
  • mloskot
    mloskot over 9 years
    @AMB Thx, I updated the link to point to alternative source of the dictionary at extensions.openoffice.org/en/project/…
  • james.garriss
    james.garriss almost 9 years
    And now the new link is 404, @mloskot.
  • mloskot
    mloskot almost 9 years
    @james.garriss I'm afraid, the whole extensions.openoffice.org site seems to be down.
  • Christopher Bonitz
    Christopher Bonitz about 8 years
    sematilog (second link) does postgresql and db2 as well.
  • hobs
    hobs about 8 years
    I was hopeful that these broader lists would contain words with punctuation, like "C++" or "C#", but couldn't find any. So if that's what you're after you can short-circuit you can skip this one (and the narrower lists in other answers).
  • garg10may
    garg10may almost 8 years
    @ChrisRae both links not working
  • max
    max almost 8 years
    seems like they include words with misspellings, like tecnology - presumably because they collect everything that shows up on the web. so it's good for password cracking / validation, but not good for applications that require real words (like spell checkers, etc.).
  • Hashim Aziz
    Hashim Aziz about 7 years
    Thanks for that link. A very enlightening read on just how many words there are in the English language, and the futility of trying to arrive at a definitive count of them. For a more concise and up-to-date read, there's also this: en.oxforddictionaries.com/explore/language-questions/….
  • James
    James about 6 years
    Link to Wordnet v3.0 data is broken :-(.
  • kangalioo
    kangalioo over 4 years
    This has a lot of "junk words", however I'm still very grateful that you put this here - it's perfect when searching for specific words that the other dictionaries don't have (e.g. firetruck)
  • zfj3ub94rf576hc4eegm
    zfj3ub94rf576hc4eegm almost 3 years
    Looks like WordNet is broken on the Princeton website too. Downvoting this answer since it is no longer reliable.
  • nikssa23
    nikssa23 almost 3 years
    en-gb.pyxidium.co.uk/dictionary/en_GB.zip can be found here: web.archive.org/web/20120210204607/http://en-gb.pyxidium.co.‌​uk/… (web archive)
  • SO_fix_the_vote_sorting_bug
    SO_fix_the_vote_sorting_bug about 2 years
    @HashimAziz The issue is probably that there isn't an objective definition of "English," as it's just a consensus type of thing. One could make a list of "every utterance ever uttered by an English speaker while speaking English." But then you'd have to define "speaking English" and "English speaker."
  • SO_fix_the_vote_sorting_bug
    SO_fix_the_vote_sorting_bug about 2 years
    @hobs Technically, "C++" is a C word (more likely from the B language), and not necessarily an English language word. It is actually defined as legal C grammar. True, English has borrowed it, but it isn't from a natural language.
  • hobs
    hobs about 2 years
    @SO_fix_the_vote_sorting_bug I don't think that's true. English is a dynamic, informal language. There is no rigid, logical definition or category theory math expression or software program you can write to identify what is and is not an English word. You must create a statistical model for what you want in your list of words for your application. I think NL is a superset of all languages (formal and informal) because humans use them all to communicate with each other.