A powerful Python tool that scrapes websites, extracts key information, and generates concise AI-powered summaries using Groq's Llama model.
- Intelligent Web Scraping - Extracts clean text content from web pages
- Multi-Page Crawling - Follows and analyzes related sub-pages
- AI-Powered Summarization - Uses Groq's Llama3-70b for high-quality summaries
- Content Refinement - Combines multiple summaries into cohesive output
- Privacy Focused - Local processing with your own API keys
- Clone the repository:
git clone https://github.com/yourusername/web-scraper-summarizer.git
- Navigate inside the web-scrapper-summarizer folder:
cd web-scraper-summarizer
3.Install the requirements:
pip install -r requirements.txt
4.Create a .env file in the same folder web-scrapper-summarizer using these commands:
touch .env
(content of .env file): GROQ_API_KEY=your_actual_key_here
5.Finally run the webSrcapper.py using :
python run webScrapper.py
- Enter the desired link of the website and hit "enter"
Limitations:
1.Dynamic Website Content β Doesn't work with: javascript rendered content
2.Anti-Scraping Protections π May fail when: -Websites block bots (Cloudflare, Distil Networks) -Rate-limiting is triggered