🕸️ Web-Scraping

This repository demonstrates two different approaches to web scraping in Python:

Static Scraping – Using standard libraries like requests and BeautifulSoup to parse HTML and extract data embedded in JavaScript variables.
Spider-Based Scraping – Using Scrapy spiders to crawl pages and extract structured data embedded inside <script> tags.

📁 Project Structure

Web-Scraping/
├── static_scraping/     # Static HTML scraping with JS data extraction
├── spider_scraping/     # Scrapy spider for JS-embedded JSON extraction
├── requirements.txt     # List of dependencies
├── LICENSE              # MIT License
└── README.md            # Project overview

🧰 Technologies Used

requests – for sending HTTP requests (in static scraping)
beautifulsoup4 – for HTML parsing
re – for regex matching <script> tags
json – to parse JavaScript-embedded JSON
scrapy – for spider-based web scraping

🔎 Scraping Methods

1. 📦 Static Scraping (`static_scraping/`)

Technique:
HTML Parsing + Embedded JavaScript Data Extraction

Loads the page with requests
Parses the HTML using BeautifulSoup
Locates a <script> tag with window.PAGE_MODEL = {...} JavaScript object
Extracts and converts the embedded JSON into structured Python data

👉 Best for websites where JSON data is embedded directly in the HTML source.

2. 🕷️ Spider-Based Scraping (`spider_scraping/`)

Technique:
Spider-Based HTML Parsing with Embedded JavaScript Data Extraction

Uses Scrapy to crawl pages
Extracts JavaScript-embedded data in <script> tags
Processes and stores it using Scrapy's item pipeline or file output

👉 Ideal for scraping multiple pages or when you want scalable crawling logic.

📦 Installation

Clone this repository:

git clone https://github.com/SAKTHIVINASH2/Web-Scraping.git
cd Web-Scraping

Install dependencies:
```
pip install -r requirements.txt
```

▶️ Usage

Static Scraping

Navigate to the static_scraping directory and run the script:

cd static_scraping
python scrape_static.py

Spider-Based Scraping (using Scrapy)

Navigate to the spider_scraping directory and run:

cd spider_scraping
scrapy crawl property_spider

📝 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙌 Author

Sakthivinash

⭐️ Give a Star!

If you found this repository useful, consider giving it a ⭐ to support the project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕸️ Web-Scraping

📁 Project Structure

🧰 Technologies Used

🔎 Scraping Methods

1. 📦 Static Scraping (`static_scraping/`)

2. 🕷️ Spider-Based Scraping (`spider_scraping/`)

📦 Installation

▶️ Usage

Static Scraping

Spider-Based Scraping (using Scrapy)

📝 License

🙌 Author

⭐️ Give a Star!

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
spider_scraping		spider_scraping
static_scraping		static_scraping
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

SAKTHIVINASH2/Web-Scraping

Folders and files

Latest commit

History

Repository files navigation

🕸️ Web-Scraping

📁 Project Structure

🧰 Technologies Used

🔎 Scraping Methods

1. 📦 Static Scraping (static_scraping/)

2. 🕷️ Spider-Based Scraping (spider_scraping/)

📦 Installation

▶️ Usage

Static Scraping

Spider-Based Scraping (using Scrapy)

📝 License

🙌 Author

⭐️ Give a Star!

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1. 📦 Static Scraping (`static_scraping/`)

2. 🕷️ Spider-Based Scraping (`spider_scraping/`)

Packages