Inshorts-NLP

Analysed syntax and Semantics of Corpus of Text Documents Retrived from Web Scraping of News articles from Inshorts and followed the Standard NLP Workflow of the CRISP-DM model.

Credits

📒 Index

Index
About
Usage
Commands
- Installation
File Structure
Brief Description
Info Gallery
Guidelines
Resources
Present Contributors
License

🔰 About

A NLP based Project which scraps the news articles of mainly 3 categories:

Technology
Sports
World

from InShorts using website urls. Finally after numerous preprocessing steps like Text Wrangling, Removing accented characters, Removing html tags, Lemmatization, Stemming, build a text normalizer to create dataset for applying sentiment analysis.

Sentiment analysis is perhaps one of the most popular applications of NLP.

The key aspect of sentiment analysis is to analyze a body of text for understanding the opinion expressed by it. Typically, quantifying this sentiment with a positive or negative value, called polarity.

Usage

This project can be used to create following key features:

Building Text summarizer using RNNs and LSTM
Gain only particular sentiment be it positive or negative.
Emojifier: Building appropriate reaction emojis from the extracted sentiments.
Building a tone detector as Grammarly (Beta) provides us.

Build this project to learn the nuances of NLP of handling Text Data.

🔌 Installation

📦 Commands

Packages which should be imported:

Pandas
Numpy
Seaborn
nltk
Afinn
TextBlob
Beautiful Soup
requests
Spacy Language Models

Note: Spacy may give lot of errors, one should make sure to proper install it. Further more refer to the requirements.txt

Just want to run the project on your local machine: Make sure you install all the packages mentioned in requirements.txt.

Clone the repository

$ git clone https://github.com/codekhal/Inshorts-NLP

Install dependencies.

$ cd Inshorts-NLP

Now in your terminal, using appropriate conda env

$ run jupyter or any other preferable editor

📂 File Structure

File structure with the basic details about files and directories.

.__Inshorts-NLP__
├── contractions.py
├── img
│   ├── scraping.png
│   ├── Sentiment_Score_News_Category.png
│   ├── sentiments.png
│   ├── stemming.png
│   ├── Visualizing_Sentiments_Box_Plot.png
│   └── workflow.png
├── LICENSE
├── news.csv
├── NLP_main.ipynb
├── __pycache__
│   └── contractions.cpython-35.pyc
├── README.md
└── requirements.txt

2 directories, 13 files

- Brief Description

Built a web scraper which had scraped news articles from Inshorts website urls. Then using numerous text-preprocessing techniques, cleaned the data for further processing. After this, turn came for sentiment analysis on the data. Various popular lexicons are used for sentiment analysis, including the following.

AFINN lexicon
Bing Liu’s lexicon
MPQA subjectivity lexicon
SentiWordNet
VADER lexicon
TextBlob lexicon

Used NLTK, AFINN and TextBlob library. Using both data visualization tools and pandas dataframe techniques to show results of the dataset.

📷 Info Gallery

The sentiment score of different genres of news category is shown with the help of the following plots.

Lastly, the count of three sentiments in different genres of news articles is depicted with the help of factor or bar plot.

📜 Guidelines

Contribution Guidelines

Future Work that could be done:

Flask/Flask App Deployment - Deploy the app so that couldbe efficiently used.
Use of Deep Learning - One may try and use deep learning for building a text summurizer and tone detector.

Kindly follow the Contributions Guildlines before you create any pull requests or issues. Though feel free to contribute in any form.
Open Source <3

📄 Resources

🌟 Present Contributors

Want to share your ideas

Feel free to reach out to me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inshorts-NLP

📒 Index

🔰 About

🔌 Installation

📦 Commands

Packages which should be imported:

📂 File Structure

- Brief Description

📷 Info Gallery

📜 Guidelines

📄 Resources

🌟 Present Contributors

Want to share your ideas

🔒 License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
img		img
.gitignore		.gitignore
LICENSE		LICENSE
NLP_main.ipynb		NLP_main.ipynb
README.md		README.md
contractions.py		contractions.py
sentiments_dataset.csv		sentiments_dataset.csv

License

codekhal/Inshorts-NLP

Folders and files

Latest commit

History

Repository files navigation

Inshorts-NLP

📒 Index

🔰 About

🔌 Installation

📦 Commands

Packages which should be imported:

📂 File Structure

- Brief Description

📷 Info Gallery

📜 Guidelines

📄 Resources

🌟 Present Contributors

Want to share your ideas

🔒 License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages