Skip to content
This repository has been archived by the owner on Aug 27, 2024. It is now read-only.

Latest commit

 

History

History
114 lines (76 loc) · 6.49 KB

README.md

File metadata and controls

114 lines (76 loc) · 6.49 KB

Insight Ink

The Press Information Bureau (PIB) automated feedback system uses web crawlers to create a dataset of news articles, Optical Character Recognition (OCR) technology to extract content from e-papers, and a public Application Programming Interface (API) to analyze YouTube videos. It then utilizes advanced Natural Language Processing (NLP) techniques to classify news articles into relevant government departments and evaluate their sentiment.

The primary functionality of the system is to send timely notifications for negative articles while providing a user-friendly dashboard for data visualization. Additionally, there is a separate Chrome extension for real-time fake news detection.


Overview

Data Acquisition

  • Asynchronous Web Scraping: Utilized BeautifulSoup library along with asynchronous libraries such as aiohttp and asyncio to efficiently scrape articles from various national and regional media websites.

  • Text Extraction & Language Translation: Implemented Google's Optical Character Recognition engine (Pytesseract) to extract text from scanned or image-based regional newspaper articles and integrated Google Translator API to translate the extracted text into English, supporting cross-language analysis.

  • Video Content Breakdown: Leveraged OpenAI Whisper API for an in-depth analysis of closed captioning in YouTube videos from selected news channels, enhancing media monitoring capabilities.

🗂️ Processed data is stored automatically in JSON format with well-defined key-value pairs, ensuring compatibility for frontend integration and wider accessibility across various applications.


Data Analysis

📊 Matplotlib library is applied automatically to generate graphs to visually represent the correlation between government departments and the sentiment expressed in news articles, making it easier to identify trends, patterns, and areas of concern.


Data Presentation

  • Cross-Platform User Interface: Designed a website using frameworks such as React and Bootstrap, with integrated SMTPlib library and Twilio API for real-time notifications to government officials regarding negative articles, thereby improving the ability to monitor and respond proactively.

  • Chrome Extension:

📦 Hosted the website on GoDaddy and configured the Frontend to send POST requests via the Axios library and the Backend to process them securely with the CORS extension provided by the Flask framework.

Getting Started

Follow these steps to set up and run the Insight Ink software on your local machine, or you can watch the demo video

Prerequisites

Installation

  1. Clone the repository to your local machine:
git clone https://github.com/areebahmeddd/Insight-Ink.git
  1. Navigate to the project directory:
cd Insight-Ink
  1. Create a virtual environment (optional but recommended):
python -m venv .venv
  1. Activate the virtual environment:
  • Windows:
    .venv\Scripts\activate
  • macOS and Linux:
    source .venv/bin/activate
  1. Install the project dependencies:
pip install -r requirements.txt
npm install

Usage

  1. Run the application and start the development server:
python app.py
npm start
  1. Access the application in your web browser by navigating to http://localhost:3000

License

This project is licensed under the Apache License 2.0

Authors

Areeb Ahmed, Shivansh Karan, Nandini Sharma, Ravikant Saraf, Mohit Nagaraj