Skip to content

MLH-Fellowship/Auto-Tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 


Auto-Tagger Logo
Auto-Tagger

Status GitHub Issues GitHub Pull Requests

An Artificial Intelligence tool that uses Transformer models and NER (Named Entity Recognition) techniques to detect proper names in a text.

This repo contains:

  • The Auto-Tagger Web App
  • The Auto-Tagger Discord bot

A video demo can be found here: https://www.youtube.com/watch?v=3XF4hOLtU1o







Auto-Tagger Repo

Key FeaturesInstallationCalling the APIUsing FlaskDocker imageDataTraining a new modelContributing



Our Auto-Tagger Web Application

Our Auto-Tagger Discord Bot

Key Features

  • Usage of Transformer models ( BERT in this case ) and NER ( Named Entity Recognition ) techniques.
  • Building a training pipeline.
  • Implementing and training the model ( using Google Colab ).
  • Building an inference pipeline.
  • Serving the model using BentoML.
  • Create a Web Application to visualize our Auto-Tagger features.
  • Create a Discord bot that implements the Auto-Tagger features.

Installation

  • All the code required to get started

Clone

  • Clone this repo to your local machine using https://github.com/MLH-Fellowship/Auto-Tagger.git

Setup

In order to install all packages follow the steps below:

  1. Download the model from this drive: https://drive.google.com/file/d/1TyuIoMO42CHHvQVlOpw6Ynco39rQbc6t/view?usp=sharing

  2. Put it in the /results/model.bin ( rename the file as model.bin )

  3. Download the BERT uncased model from here: https://www.kaggle.com/abhishek/bert-base-uncased

  4. Unzip the files in /model/

  5. Run python serving.py inside /src/

  6. Execute the command bentoml serve PyTorchModel:latest

The model will be served on http://127.0.0.1:5000/


Calling the api

To send a request you'd need to send in a POST request:

curl -i --header "Content-Type: application/json" \
        --request POST \
        --data '{"sentence": "John used to play for The Beatles"}' \
        http://127.0.0.1:5000/predict

Example:

#request
{ 
  "sentence": "Jack and James went to the university and they met Emily"
}

The response will be a string of all the names detected separated by a ','. In this example it will be:

#response
"jack,james,emily"

Using Flask

Follow these steps after step 5 in Setup (in /src/ directory):

export FLASK_APP=front.py
export FLASK_DEBUG=1 # For debugging
flask run

Note: Be sure to modify the LOAD_PATH variable in front.py depending on your bentoml latest model location


Creating and running a Docker image and deploying it on Heroku

This sub-section is thoroughly explained in the wiki page of this repository.


Creating and running the discord bot

Documentation is available at the wiki page of this repository.


Data

We used an Annotated Corpus for Named Entity Recognition dataset, that we found on kaggle: https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus

This is the extract from GMB corpus which is tagged, annotated and built specifically to train the classifier to predict named entities such as name, location, etc.

This dataset contains 47958 sentences with 948241 words.


Training a new model

You can train your own model by using the train.py script. Change the config.py file with the parameters you want and then execute the following command:

python train.py

This will generate your model file in config.MODEL_PATH as model.bin.


Contributing

To get started...

Step 1

  • Option 1

    • 🍴 Fork this repo!
  • Option 2

    • 👯 Clone this repo to your local machine using https://github.com/MLH-Fellowship/Auto-Tagger.git

Step 2

  • HACK AWAY! 🔨🔨🔨

Step 3


License

This project is licensed under the Apache License, Version 2.0.

Releases

No releases published

Sponsor this project

 

Packages

No packages published