Skip to content

Simple web scraping demo: titles, prices, stock & URLs from Books to Scrape. CLI flags (pages, delay, max-price). Exports CSV/XLSX.

Notifications You must be signed in to change notification settings

mircothibes/Web-Scraping-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Scraper (Books to Scrape)

A tiny beginner-friendly web scraping project using requests + BeautifulSoup. Targets the demo website books.toscrape.com (made for scraping practice).


Preview (Excel)

CSV opened in Excel — sample output


Features

  • Scrapes book title, price (raw + numeric), stock text, URL
  • Pagination (choose number of pages)
  • Optional price filter (--max-price 25)
  • CSV output

Requirements

  • Python 3.10+ (works with 3.8/3.9 if you replace float | None with Optional[float])
  • requests, beautifulsoup4, lxml

Install

python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

pip install -r requirements.txt

Usage

python scrape_books.py


Options

python scrape_books.py \
  --pages 3 \            # how many pages to scrape (default: 3)
  --max-price 25 \       # keep only books <= £25 (omit to disable)
  --out cheap_books.csv \# output filename (default: books.csv)
  --delay 1.0 \          # seconds between pages (default: 1.0)
  --start-url https://books.toscrape.com/catalogue/page-1.html

Output

CSV with columns: title, price_raw, price_value, stock, url

Example:

Page 1: 20 rows
Page 2: 20 rows
Page 3: 20 rows
Saved 60 rows to books.csv

Notes (ethics & safety)

  • This project targets a demo site explicitly built for scraping practice.
  • Always be polite: keep a small delay between requests (--delay).
  • Do not scrape personal data or violate sites’ robots.txt / Terms of Service.

License

This project is released under the MIT License


Author

Developed by Marcos Vinicius Thibes Kemer


About

Simple web scraping demo: titles, prices, stock & URLs from Books to Scrape. CLI flags (pages, delay, max-price). Exports CSV/XLSX.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages