Skip to content

Sam3420/Web-Scpapping-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌐 Web Scraper & AI Summarizer

A powerful Python tool that scrapes websites, extracts key information, and generates concise AI-powered summaries using Groq's Llama model.

✨ Features

  • Intelligent Web Scraping - Extracts clean text content from web pages
  • Multi-Page Crawling - Follows and analyzes related sub-pages
  • AI-Powered Summarization - Uses Groq's Llama3-70b for high-quality summaries
  • Content Refinement - Combines multiple summaries into cohesive output
  • Privacy Focused - Local processing with your own API keys

πŸ› οΈ Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/web-scraper-summarizer.git
     
  2. Navigate inside the web-scrapper-summarizer folder:
    cd web-scraper-summarizer
     

3.Install the requirements:

pip install -r requirements.txt

4.Create a .env file in the same folder web-scrapper-summarizer using these commands:

 touch .env

(content of .env file): GROQ_API_KEY=your_actual_key_here

5.Finally run the webSrcapper.py using :

python run webScrapper.py
  1. Enter the desired link of the website and hit "enter"

πŸŽ‰ Hurray! You've Got the Data!

Limitations:

1.Dynamic Website Content ❌ Doesn't work with: javascript rendered content

2.Anti-Scraping Protections πŸ›‘ May fail when: -Websites block bots (Cloudflare, Distil Networks) -Rate-limiting is triggered

About

Built a webscarping tool using langchain

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages