Skip to content

jonathanrao99/Book-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“š Book Scraper

Python BeautifulSoup License

Scrape books like a boss! ๐Ÿš€

Extract book data from books.toscrape.com with style


๐ŸŽฏ What's This?

A super cool web scraper that grabs book info from an online bookstore. Think of it as your personal digital librarian that works 24/7! ๐Ÿ“–

โœจ What You Get

  • ๐Ÿ“š Book titles, prices, ratings
  • ๐Ÿ“Š Clean, organized data
  • ๐Ÿ’พ Export to CSV, Excel, JSON
  • ๐ŸŽฏ Error handling & retries
  • โšก Fast & respectful scraping

๐Ÿš€ Quick Start

# 1. Clone it
git clone <your-repo-url>
cd Book-scraping

# 2. Install stuff
pip install -r requirements.txt

# 3. Run it!
jupyter notebook book_scraper_optimized.ipynb

That's it! ๐ŸŽ‰


๐ŸŽฎ How to Use

Option 1: Jupyter Notebook (Recommended)

jupyter notebook book_scraper_optimized.ipynb

Perfect for learning and experimenting

Option 2: Command Line

py book_scraper.py --max-pages 5

For the terminal warriors

Option 3: As a Module

from book_scraper import BookScraper
scraper = BookScraper()
data = scraper.scrape_all_books(max_pages=10)

For the code ninjas


๐Ÿ“Š Sample Output

๐Ÿ“š Total books: 1000
๐Ÿ’ฐ Price range: ยฃ10.00 - ยฃ59.99
โญ Rating Distribution:
  Three: 250 books
  Four: 200 books
  Five: 180 books
๐Ÿ“ฆ Availability: 950 in stock, 50 out of stock

๐Ÿ› ๏ธ What's Inside

Book-scraping/
โ”œโ”€โ”€ ๐Ÿ“„ book_scraper_optimized.ipynb  # Main notebook
โ”œโ”€โ”€ ๐Ÿ book_scraper.py               # CLI script
โ”œโ”€โ”€ โš™๏ธ config.py                     # Settings
โ”œโ”€โ”€ ๐Ÿ”ง utils.py                      # Helpers
โ”œโ”€โ”€ ๐Ÿงช test_scraper.py               # Tests
โ””โ”€โ”€ ๐Ÿ“‹ requirements.txt              # Dependencies

๐Ÿ› Troubleshooting

Problem: ModuleNotFoundError: No module named 'bs4' Solution: pip install -r requirements.txt

Problem: Website not responding Solution: Check your internet connection

Problem: Rate limited Solution: Increase delay: --delay 2


๐Ÿค Contributing

  1. Fork it ๐Ÿด
  2. Create a branch ๐ŸŒฟ
  3. Make changes โœ๏ธ
  4. Submit PR ๐Ÿš€

Ideas welcome! ๐Ÿ’ก


โš ๏ธ Disclaimer

For educational purposes only! Always be respectful when scraping websites. Use delays and follow robots.txt! ๐Ÿค–


๐ŸŒŸ Star the Repository

If you find this project helpful, please give it a โญ on GitHub!

GitHub stars

๐Ÿ“ž Connect & Support

GitHub LinkedIn Buy Me a Coffee


Made with โค๏ธ and โ˜• by Jonathan Thota

Scraping the web, one book at a time! ๐Ÿ“–

About

A super cool web scraper that extracts book data from books.toscrape.com with style! ๐Ÿ“šโœจ

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published