Difference between Fasttext .vec and .bin file
Solution 1
The .vec
files contain only the aggregated word vectors, in plain-text. The .bin
files in addition contain the model parameters, and crucially, the vectors for all the n-grams.
So if you want to encode words you did not train with using those n-grams (FastText's famous "subword information"), you need to find an API that can handle FastText .bin
files (most only support the .vec
files, however...).
Solution 2
As the documentation says,
model.vec
is a text file containing the word vectors, one per line.model.bin
is a binary file containing the parameters of the model along with the dictionary and all hyper parameters.
In other words, .vec
file format is the same as .txt
file format, and you could use it in other applications (for example, to exchange data between your FastText model and your Word2Vec model since .vec
file is similar to .txt
file generated by Word2Vec). And the .bin
file could be used if you want to continue training the vectors or to restart the optimization.
Related videos on Youtube
Bhushan Pant
Love building exciting products. Currently working as Research Scientist-Natural Language Processing in a Fintech Startup.
Updated on June 04, 2022Comments
-
Bhushan Pant almost 2 years
I recently downloaded fasttext pretrained model for english. I got two files:
- wiki.en.vec
- wiki.en.bin
I am not sure what is the difference between the two files?
-
Akash Kandpal almost 6 yearsI agree with the highlighted part but we can further train .vec files also on other data/corpus .
-
information_interchange almost 4 yearsHow do you actually work with the
.bin
file? I've triedopen(FILENAME, "rb")
but then not sure how to actually access the weight matrix after that -
Bob van Luijt almost 4 years@information_interchange did you find the answer to this question?
-
fnl almost 4 yearsPlease take a look at the official documentation for the Python API: github.com/facebookresearch/fastText/tree/master/…
-
dapperdan over 3 years@information_interchange you can do model = fasttext.load_model("embedding.bin") to load a model object.