Skip to content

Latest commit

 

History

History
114 lines (80 loc) · 4.26 KB

README.md

File metadata and controls

114 lines (80 loc) · 4.26 KB

Reddit-Flair-Detection

This repo illustrates the how to build a machine learning classifier to predit the flairs of the post of r/india

Go to r/india and open a post

recognized_test1.png

Copy its url and paste it into the app

recognized_test2.png

Live web app is here: Website

Requirements

The following installation has been tested on MacOSX 10.13.6 and Ubuntu 16.04.

This project requires Python 3 and the following Python libraries installed(plus a few other s depending on task):

  1. Clone the repo
git clone https://github.com/gauravchopracg/Reddit-Flair-Detection.git
cd Reddit-Flair-Detection/
  1. Install Dependencies
pip install -r requirements.txt

In this part, I have collected two dataset:

  1. 1 year dataset: from 1st January 2019 to 1st January 2020 with features title, flair and body on post using Pushshift's API
  2. Balanced dataset: 100 post from 9 flairs using praw module.

Two dataset were collected to test different machine learning algorithms and deep learning models one subset and other yearly data, later they were used as train and test set

For detailed notes please look at here

In this part, we have try to understand the data, build intuition about the data and find insights in the data. It consist of:

  1. Univariate Analysis
  2. Bivariate Analysis
  3. Feature Engineering

This part includes :

  1. Data Preprocessing
  2. Hyperparamter Optimization
  3. Choosing a Validation Strategy
  4. Trying Both machine learning and deep learning framework

Results-

Machine Learning Algorithm Train Accuracy Validation Accuracy Test Accuracy
Logistic Regression (Title only) 0.615 0.623 0.402
Logistic Regression (Title only + Preprocessing) 0.546 0.493 0.621
BERT (Title + Body + Preprocessing) 0.671 0.546 0.651

Building a Web Application

Web application has been developed with Python and Flask framework. The project has been developed using the tutorial Flask Mega-Tutorial for Python 3.6

To run the app in you computer:

  1. Clone the repo
$ git clone https://github.com/gauravchopracg/Reddit-Flair-Detection.git
$ cd Reddit-Flair-Detection/Web Application
  1. Install Dependencies
$ pip install -r requirements.txt
  1. Import the package
$ export FLASK_APP=rfd.py

If you are using Microsoft Windows, use set instead of export in the command above

  1. Run
$ flask run
 * Serving Flask app "rfd"
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Deployment

The web application is deployed to Heroku cloud platform. A developer API using flask has been implemented, which returns a JSON containing a python dictionary in which key is URL of post and values are predicted flair.

Can be accessed by querying POST request:

import requests

files = {'upload_file': open('test.txt','rb')}
r = requests.post("http://rdflair.herokuapp.com/automated_testing", files=files)