Load PreComputed Vectors Gensim

17,468

Solution 1

You can download pre-trained word vectors from here (get the file 'GoogleNews-vectors-negative300.bin'): word2vec

Extract the file and then you can load it in python like:

model = gensim.models.word2vec.Word2Vec.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)

model.most_similar('dog')

EDIT (May 2017): As the above code is now deprecated, this is how you'd load the vectors now:

model = gensim.models.KeyedVectors.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)

Solution 2

The GloVe dump from the Stanford site is in a format that is little different from the word2vec format. You can convert the GloVe file into word2vec format using:

python -m gensim.scripts.glove2word2vec --input  glove.840B.300d.txt --output glove.840B.300d.w2vformat.txt
Share:
17,468

Related videos on Youtube

MEric
Author by

MEric

Updated on June 04, 2022

Comments

  • MEric
    MEric almost 2 years

    I am using the Gensim Python package to learn a neural language model, and I know that you can provide a training corpus to learn the model. However, there already exist many precomputed word vectors available in text format (e.g. http://www-nlp.stanford.edu/projects/glove/). Is there some way to initialize a Gensim Word2Vec model that just makes use of some precomputed vectors, rather than having to learn the vectors from scratch?

    Thanks!

  • Michael Douma
    Michael Douma about 6 years
    Worked on MacOS 10.13
  • Piyush Rumao
    Piyush Rumao over 5 years
    As stated by @Chaitanya Shivade above loading.txt format form glove dataset gives the error in Gensim, so we can convert it into a format required by gensim like w2vformat.txt or even glove.840B.300d.txt.word2vec
  • Amir Imani
    Amir Imani almost 5 years
    this is the right answer to load glove pre-trained models