This repository contains implementations of Word2vec using PyTorch on the NLTK corpus.
Word2vec is a popular technique in natural language processing for generating word embeddings. This project provides implementations of both the Continuous Bag of Words (CBOW) and SkipGram models using PyTorch. The models are trained on the NLTK corpus to learn high-quality word embeddings.
To test and run the models, refer to the provided Jupyter notebook files. Further details, including trained model weights and the NLTK corpus, will be updated soon.
- Keras
- PyTorch
- NumPy
- Pandas
- Matplotlib
- NLTK
- Regular Expression
Below are some sample outputs from the SkipGram model after 30 epochs of training:
Epoch: 30/30 Loss: 0.06470464915037155
Target Word | Predicted Context Words / Similar Words
open | impossible, png, radio, fast, don’t
error | started, was, strict, existing, rather
option | accesskeys, realplayer, which, progress, visible
to | whenever, searching, encryption, brings, secure
should | malformed, responding, profile, 4, logging
firebird | users, column, more, speed, return
javascript | talkback, extension, install, leaving, sends
button | checked, horizontal, child, something, every
hit | protocol, follow, overflow, focus, true
servers | offer, highlighted, sites, responding, way
such | few, weird, changes, instances, fonts
separate | take, renaming, moving, tar, scroll
case | browse, fullscreen, addressbar, creating, dragged
dragged | case, brings, transparent, created, unexpected
handler | compile, closes, skin, having, breaks
example | filename, your, numbers, advanced, macos
Contributions to this project are welcome. If you have ideas for improvements or new features, feel free to open an issue or submit a pull request.
Special thanks to the NLTK contributors for providing the corpus used in this project.