This repo contains the 3 popular word2Vec models implemented in pytorch. Implemeted models:
- Skipgram
- Continious Bag of Words(CBOW)
- Global Vectors (GloVe)
We tried to train it o0n 10% of the latest Wiki Dump, but were unable to process it, due to computational resources. Thus trained it on a small dataset, included in the repo. We downloaded the Wiki Xml file and preprocessed it to .txt file using the extractor script.
cd src
python3 main.py
The word vectors are evaluated on SimLex-999 dataset, as you can see in the notebook.
The word vectors are also generated using this notebook.
The word vector visualization can be seen here
References: