Welcome to the Web Scraping Projects repository! This collection features various Python-based web scraping projects designed to extract and process data from different websites.
Serial No. | Project Name | Description | GitHub Link |
---|---|---|---|
1 | Chrono24 Watch Scraper | Scrapy spider for extracting watch details from chrono24.com in JSON format. | |
2 | The Infatuation Scraper | Web scraping script using Beautiful Soup and Scrapy to extract restaurant data from www.theinfatuation.com. | |
3 | Houzz Scraper | Scrapy and BeautifulSoup script for extracting contact information from business websites on www.houzz.com. Stores data in a CSV file. | |
4 | Yellow Pages Scraper | Python tool to extract business information from Yellow Pages, offering command-line and Streamlit interfaces for seamless data retrieval. | |
5 | NextInsurance Agent Scraper | Scraper designed to extract contact information for agents from NextInsurance.com. | |
6 | Pages24 Scraper | Web scraper to extract data from Pages24 website and retrieve information in a structured format. | |
7 | QBCC Local Contractor Scraper | Python script to scrape local contractor information from QBCC (Queensland Building and Construction Commission) website. |
Explore these projects and use the provided code as a reference or starting point for your web scraping endeavors.
We welcome contributions to enhance and expand this repository. If you have ideas for new projects, improvements to existing ones, or bug fixes, please follow the standard GitHub workflow:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Implement your changes.
- Test thoroughly.
- Submit a pull request with a clear description of your changes.
Your contributions are highly valued, and together we can make this repository even more robust.
This repository is licensed under the MIT License - see the LICENSE file for details.
Thank you for visiting the Web-scraping Projects repository. We hope you find the projects useful. For any questions or concerns, please reach out to the repository maintainers.