Ceratec-PDP-Scraper

A Python-based web scraping project to extract Product Detail Page (PDP) data from Ceratec Surfaces product pages.

This project is built specifically around the Ceratec PDP structure (example: Alchemy series) and demonstrates how to reliably collect structured product data for e-commerce, cataloging, or analytics use cases.

🔍 Example Product Page

https://www.ceratec.com/fr/RSS-2048-Alchemy-2

📌 Data Points Extracted

The scraper follows the navigation and data mapping shown in the provided screenshots and documentation.

#	Field
1	Product URL
2	Product Title
3	Breadcrumbs
4	Product Description
5	Image URLs (all variants)
6	Color of Variant
7	Color Grouping
8	Surface Finish / Material
9	Dimension / Size Grouping
10	Technical Documents (PDF links)

📂 Project Structure

ceratec-pdp-scraper/
│
├── pdp_ceratec.py          # Main scraper logic
├── constants.py           # Headers & cookies
├── Navigatin_document.pptx # Navigation & field mapping reference
├── Pending_todo_Website.txt
├── product_data_*.csv     # Output (CSV)
├── product_data_*.xlsx    # Output (Excel)
└── README.md

⚙️ Tech Stack

Python 3.9+
requests
beautifulsoup4
pandas

🚀 How to Run

1️⃣ Install Dependencies

pip install requests beautifulsoup4 pandas

2️⃣ Run the Scraper

python pdp_ceratec.py

3️⃣ Output

The script automatically generates timestamped files:

product_data_YYYYMMDD_HHMMSS.csv
product_data_YYYYMMDD_HHMMSS.xlsx

🧠 How It Works

Sends a browser-like request using realistic headers
Parses the HTML with BeautifulSoup
Extracts variant attributes using label-based matching
Collects technical document download links
Stores structured output using Pandas

🧩 Key Functions

get_product_title()
get_breadcrumbs()
get_description()
get_image_urls()
get_color_grouping_by_tag()
get_size_grouping()
get_material()
get_technical_document()

📎 Notes

Cookies are optional but included for stability
Script retries requests automatically
Easy to scale for multiple product URLs

📈 Use Cases

E-commerce catalog scraping
Product data enrichment
Variant image & color extraction
Market research & analytics

👤 Author

Shoeb Ahmed Web Scraping | Python | Data Extraction

⚠️ Disclaimer

This project is for educational and portfolio purposes only. Please respect website terms of service before scraping.

⭐ If this helped you, consider starring the repository!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Navigatin_document.pptx		Navigatin_document.pptx
Pending_todo_Website.txt		Pending_todo_Website.txt
README.md		README.md
SS_1.PNG		SS_1.PNG
SS_2.PNG		SS_2.PNG
constants.py		constants.py
pdp_ceratec.py		pdp_ceratec.py
product_data_20260118_214724.csv		product_data_20260118_214724.csv
product_data_20260118_214724.xlsx		product_data_20260118_214724.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ceratec-PDP-Scraper

🔍 Example Product Page

📌 Data Points Extracted

📂 Project Structure

⚙️ Tech Stack

🚀 How to Run

1️⃣ Install Dependencies

2️⃣ Run the Scraper

3️⃣ Output

🧠 How It Works

🧩 Key Functions

📎 Notes

📈 Use Cases

👤 Author

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shoeb370/Ceratec-PDP-Scraper

Folders and files

Latest commit

History

Repository files navigation

Ceratec-PDP-Scraper

🔍 Example Product Page

📌 Data Points Extracted

📂 Project Structure

⚙️ Tech Stack

🚀 How to Run

1️⃣ Install Dependencies

2️⃣ Run the Scraper

3️⃣ Output

🧠 How It Works

🧩 Key Functions

📎 Notes

📈 Use Cases

👤 Author

⚠️ Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages