This repository demonstrates two different approaches to web scraping in Python:
- Static Scraping – Using standard libraries like
requestsandBeautifulSoupto parse HTML and extract data embedded in JavaScript variables. - Spider-Based Scraping – Using Scrapy spiders to crawl pages and extract structured data embedded inside
<script>tags.
Web-Scraping/
├── static_scraping/ # Static HTML scraping with JS data extraction
├── spider_scraping/ # Scrapy spider for JS-embedded JSON extraction
├── requirements.txt # List of dependencies
├── LICENSE # MIT License
└── README.md # Project overview
requests– for sending HTTP requests (in static scraping)beautifulsoup4– for HTML parsingre– for regex matching<script>tagsjson– to parse JavaScript-embedded JSONscrapy– for spider-based web scraping
Technique:
HTML Parsing + Embedded JavaScript Data Extraction
- Loads the page with
requests - Parses the HTML using
BeautifulSoup - Locates a
<script>tag withwindow.PAGE_MODEL = {...}JavaScript object - Extracts and converts the embedded JSON into structured Python data
👉 Best for websites where JSON data is embedded directly in the HTML source.
Technique:
Spider-Based HTML Parsing with Embedded JavaScript Data Extraction
- Uses Scrapy to crawl pages
- Extracts JavaScript-embedded data in
<script>tags - Processes and stores it using Scrapy's item pipeline or file output
👉 Ideal for scraping multiple pages or when you want scalable crawling logic.
-
Clone this repository:
git clone https://github.com/SAKTHIVINASH2/Web-Scraping.git cd Web-Scraping -
Install dependencies:
pip install -r requirements.txt
Navigate to the static_scraping directory and run the script:
cd static_scraping
python scrape_static.pyNavigate to the spider_scraping directory and run:
cd spider_scraping
scrapy crawl property_spiderThis project is licensed under the MIT License. See the LICENSE file for details.
If you found this repository useful, consider giving it a ⭐ to support the project!