This repository contains a sample web scraping script developed as part of my academic capstone project at Deakin University, Australia.
The project aimed to explore browser automation, data extraction techniques, and ethical scraping practices in a controlled, academic environment. This repository is for demonstration purposes only and is not intended for reuse, deployment, or execution.
- Designed to simulate human browsing behavior to access publicly visible product data
- Focused on techniques like session rotation, CAPTCHA handling, and API parsing
- Data storage via MongoDB and export to structured formats
- All scraping flows were tested within the bounds of ethical research and responsible automation
- Python
- Selenium Wire & Undetected Chromedriver
- Smartproxy (residential, rotating)
- MongoDB
- Chrome WebDriver
This repository is made publicly visible only as a portfolio showcase of my technical and academic experience.
It must not be cloned, executed, or reused for scraping any real-world websites.
By viewing this repository, you agree to the following:
- The author does not condone or promote illegal scraping or violation of any website's terms of service
- This code is not provided as a tool or framework for others to scrape websites
- The repository excludes configuration files, credentials, and execution dependencies
- The author is not liable for any for any misuse or unauthorized use of the code
Retail-Web-Scraper/ βββ scraper_coles.py # Main academic scraper script βββ .gitignore # Excludes sensitive/confidential files
Shishir Dhakal
π Melbourne, Australia
π Postgraduate Student β IT Management
π shishirdhakal.com
π LinkedIn
For a full walkthrough and explanation, see the blog:
https://shishirdhakal.com/coles-web-scraper-project/