Skip to content

Latest commit

 

History

History
68 lines (55 loc) · 2.31 KB

README.md

File metadata and controls

68 lines (55 loc) · 2.31 KB

Odia Word Embeddings

Train Odia word embeddings using word2vec.

Dependencies

See the dependencies in requirements.txt. The code has been tested with Python 3.6.

Overview

Check out this blog post to get an illustrated guide 📙 to word2vec.

  • First download Odia text data.
mkdir data
cd data
!wget https://storage.googleapis.com/ai4bharat-public-indic-nlp-corpora/data/monolingual/indicnlp_v1/sentence/or.txt.gz
tar -zxvf or.txt.gz
head or

Setup

Train

Web App Local Conda setup

  • Clone the repo
  • Put the embeddings.txt you have trained inside src/model/ directory.
  • Create a virtual environment. If installed Anaconda, you can try:
    $ conda create -n word_embeddings python=3.6
  • Yes, we need Python 3.6 version for this.
  • Install all the python dependencies with the following command:
    $ pip install -r requirements.txt
  • Run the following command to run the server:
    $ gunicorn app:app -b 0.0.0.0:31137
  • Now you can see the web app running in your browser at http://127.0.0.1:31137/word2vec
  • If faced any error like below, please setup an environment variable PYTHONIOENCODING with value utf-8
    UnicodeEncodeError: 'ascii' codec can't encode character '\u2771' in position 1659: ordinal not in range(128)
    *** You may need to add PYTHONIOENCODING=utf-8 to your environment ***

Web App Docker setup

  1. Install Docker Desktop for Mac and Windows in your system.

  2. Run the Docker. Type docker in your command prompt/terminal to check if this command is working.

  3. Go to the project root folder i.e. word-embeddings.

  4. Use the following command to build the image from Dockerfile

    docker build -t word_embeddings:latest .
  5. Then you can run the following command to run the docker image.

    docker run --rm -it  -p 31137:31137 word_embeddings:latest

Snapshot of web app

LICENSE