This web application is developed to help classify news articles into 4 categories - a] Entertainment b] Politics c] Technology d] Business.
The application uses NLP techniques and machine learning models to classify the articles into their respective categories. This application deals with text data(news articles) hence the preprocessing of data is slightly different than usual.
You can check out the app here
- Clean the text
- Tokenize the text
- Apply Lemmatization(WordNet)
- Remove stopwords
- Apply TF-IDF vectorization
- Split the processed data into train and test sets.
- Apply ML Algorithms like Random Forest and Logistic Regression.
- Compare the results of the two applied algorithms and choose the best one(in this case Logistic Regression).
- Save the model into pickle file.
- Use the flask web framework.
- Use the Google Search API to fetch the news.
- Use the saved model to classify the fetched news.
- Deploy the web application using Heroku platform.