Train Odia word embeddings using word2vec.
See the dependencies in requirements.txt
.
The code has been tested with Python 3.6.
Check out this blog post to get an illustrated guide 📙 to word2vec.
- First download Odia text data.
mkdir data
cd data
!wget https://storage.googleapis.com/ai4bharat-public-indic-nlp-corpora/data/monolingual/indicnlp_v1/sentence/or.txt.gz
tar -zxvf or.txt.gz
head or
- To train word embeddings, see the notebook word2vec.ipynb.
- Clone the repo
- Put the
embeddings.txt
you have trained insidesrc/model/
directory. - Create a virtual environment. If installed Anaconda, you can try:
$ conda create -n word_embeddings python=3.6
- Yes, we need Python 3.6 version for this.
- Install all the python dependencies with the following command:
$ pip install -r requirements.txt
- Run the following command to run the server:
$ gunicorn app:app -b 0.0.0.0:31137
- Now you can see the web app running in your browser at http://127.0.0.1:31137/word2vec
- If faced any error like below, please setup an environment variable
PYTHONIOENCODING
with valueutf-8
UnicodeEncodeError: 'ascii' codec can't encode character '\u2771' in position 1659: ordinal not in range(128) *** You may need to add PYTHONIOENCODING=utf-8 to your environment ***
-
Run the Docker. Type
docker
in your command prompt/terminal to check if this command is working. -
Go to the project root folder i.e.
word-embeddings
. -
Use the following command to build the image from Dockerfile
docker build -t word_embeddings:latest .
-
Then you can run the following command to run the docker image.
docker run --rm -it -p 31137:31137 word_embeddings:latest