Load PreComputed Vectors Gensim
Solution 1
You can download pre-trained word vectors from here (get the file 'GoogleNews-vectors-negative300.bin'): word2vec
Extract the file and then you can load it in python like:
model = gensim.models.word2vec.Word2Vec.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)
model.most_similar('dog')
EDIT (May 2017): As the above code is now deprecated, this is how you'd load the vectors now:
model = gensim.models.KeyedVectors.load_word2vec_format(os.path.join(os.path.dirname(__file__), 'GoogleNews-vectors-negative300.bin'), binary=True)
Solution 2
The GloVe dump from the Stanford site is in a format that is little different from the word2vec format. You can convert the GloVe file into word2vec format using:
python -m gensim.scripts.glove2word2vec --input glove.840B.300d.txt --output glove.840B.300d.w2vformat.txt
Related videos on Youtube
MEric
Updated on June 04, 2022Comments
-
MEric almost 2 years
I am using the Gensim Python package to learn a neural language model, and I know that you can provide a training corpus to learn the model. However, there already exist many precomputed word vectors available in text format (e.g. http://www-nlp.stanford.edu/projects/glove/). Is there some way to initialize a Gensim Word2Vec model that just makes use of some precomputed vectors, rather than having to learn the vectors from scratch?
Thanks!
-
Michael Douma about 6 yearsWorked on MacOS 10.13
-
Piyush Rumao over 5 yearsAs stated by @Chaitanya Shivade above loading.txt format form glove dataset gives the error in Gensim, so we can convert it into a format required by gensim like w2vformat.txt or even glove.840B.300d.txt.word2vec
-
Amir Imani almost 5 yearsthis is the right answer to load glove pre-trained models