A super cool web scraper that grabs book info from an online bookstore. Think of it as your personal digital librarian that works 24/7! ๐
- ๐ Book titles, prices, ratings
- ๐ Clean, organized data
- ๐พ Export to CSV, Excel, JSON
- ๐ฏ Error handling & retries
- โก Fast & respectful scraping
# 1. Clone it
git clone <your-repo-url>
cd Book-scraping
# 2. Install stuff
pip install -r requirements.txt
# 3. Run it!
jupyter notebook book_scraper_optimized.ipynbThat's it! ๐
jupyter notebook book_scraper_optimized.ipynbPerfect for learning and experimenting
py book_scraper.py --max-pages 5For the terminal warriors
from book_scraper import BookScraper
scraper = BookScraper()
data = scraper.scrape_all_books(max_pages=10)For the code ninjas
๐ Total books: 1000
๐ฐ Price range: ยฃ10.00 - ยฃ59.99
โญ Rating Distribution:
Three: 250 books
Four: 200 books
Five: 180 books
๐ฆ Availability: 950 in stock, 50 out of stock
Book-scraping/
โโโ ๐ book_scraper_optimized.ipynb # Main notebook
โโโ ๐ book_scraper.py # CLI script
โโโ โ๏ธ config.py # Settings
โโโ ๐ง utils.py # Helpers
โโโ ๐งช test_scraper.py # Tests
โโโ ๐ requirements.txt # Dependencies
Problem: ModuleNotFoundError: No module named 'bs4'
Solution: pip install -r requirements.txt
Problem: Website not responding Solution: Check your internet connection
Problem: Rate limited
Solution: Increase delay: --delay 2
- Fork it ๐ด
- Create a branch ๐ฟ
- Make changes โ๏ธ
- Submit PR ๐
Ideas welcome! ๐ก
For educational purposes only! Always be respectful when scraping websites. Use delays and follow robots.txt! ๐ค