Insight Ink

The Press Information Bureau (PIB) automated feedback system uses web crawlers to create a dataset of news articles, Optical Character Recognition (OCR) technology to extract content from e-papers, and a public Application Programming Interface (API) to analyze YouTube videos. It then utilizes advanced Natural Language Processing (NLP) techniques to classify news articles into relevant government departments and evaluate their sentiment.

The primary functionality of the system is to send timely notifications for negative articles while providing a user-friendly dashboard for data visualization. Additionally, there is a separate Chrome extension for real-time fake news detection.

Overview

Data Acquisition

Asynchronous Web Scraping: Utilized BeautifulSoup library along with asynchronous libraries such as aiohttp and asyncio to efficiently scrape articles from various national and regional media websites.
Text Extraction & Language Translation: Implemented Google's Optical Character Recognition engine (Pytesseract) to extract text from scanned or image-based regional newspaper articles and integrated Google Translator API to translate the extracted text into English, supporting cross-language analysis.
Video Content Breakdown: Leveraged OpenAI Whisper API for an in-depth analysis of closed captioning in YouTube videos from selected news channels, enhancing media monitoring capabilities.

🗂️ Processed data is stored automatically in JSON format with well-defined key-value pairs, ensuring compatibility for frontend integration and wider accessibility across various applications.

Data Analysis

Department Categorization: Developed a machine learning model using the Support Vector Machine (SVM) algorithm, complemented by Natural Language Processing techniques like Text Lemmatization and Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, to analyze a dataset comprising diverse government departments. The test model achieved an accuracy of ~95%
Sentiment Analysis: Trained a Bidirectional Encoder Representations from Transformers (BERT) model within the PyTorch framework, on a dataset comprising articles classified as positive, neutral, and negative. The test model achieved an accuracy of ~81%, closely matching the ground truth labels.

📊 Matplotlib library is applied automatically to generate graphs to visually represent the correlation between government departments and the sentiment expressed in news articles, making it easier to identify trends, patterns, and areas of concern.

Data Presentation

Cross-Platform User Interface: Designed a website using frameworks such as React and Bootstrap, with integrated SMTPlib library and Twilio API for real-time notifications to government officials regarding negative articles, thereby improving the ability to monitor and respond proactively.
Chrome Extension:

📦 Hosted the website on GoDaddy and configured the Frontend to send POST requests via the Axios library and the Backend to process them securely with the CORS extension provided by the Flask framework.

Getting Started

Follow these steps to set up and run the Insight Ink software on your local machine, or you can watch the demo video

Prerequisites

Installation

Clone the repository to your local machine:

git clone https://github.com/areebahmeddd/Insight-Ink.git

Navigate to the project directory:

cd Insight-Ink

Create a virtual environment (optional but recommended):

python -m venv .venv

Activate the virtual environment:

Windows:
```
.venv\Scripts\activate
```
macOS and Linux:
```
source .venv/bin/activate
```

Install the project dependencies:

pip install -r requirements.txt

npm install

Usage

Run the application and start the development server:

python app.py

npm start

Access the application in your web browser by navigating to http://localhost:3000

License

This project is licensed under the Apache License 2.0

Authors

Areeb Ahmed, Shivansh Karan, Nandini Sharma, Ravikant Saraf, Mohit Nagaraj

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Backend		Backend
Frontend		Frontend
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insight Ink

Overview

Getting Started

Prerequisites

Installation

Usage

License

Authors

About

Releases

Packages

Contributors 6

Languages

License

areebahmeddd/Insight-Ink

Folders and files

Latest commit

History

Repository files navigation

Insight Ink

Overview

Getting Started

Prerequisites

Installation

Usage

License

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages