-
Notifications
You must be signed in to change notification settings - Fork 0
A Python web scraper for extracting profile data from paginated web directories. This tool handles automatic pagination, structured data extraction, and outputs results in both CSV and JSON formats. Ideal for aggregating business directories, vendor listings, and other web-based profile data.
License
Couchtr26/Web_Scraper
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Repository files navigation
# Generic_Profile_Scraper
## π§ Description
A Python-based, generic web scraper designed to extract structured profile data from paginated web directories. Demonstrates a full scraping workflow: pagination handling, HTML parsing, data extraction, and multi-format file output (CSV & JSON).
**Adaptable for:**
- Business directories
- Vendor listings
- Service provider profiles
- CRM data extraction
- General web data aggregation
---
## π» Features
- Automatically scrapes multi-page directories
- Extracts profile-level data from individual pages
- Outputs results in both CSV and JSON formats
- Written in Python (`requests`, `BeautifulSoup4`, `csv`, `json`)
- Clear data extraction logic for easy customization
---
## π Technologies Used
- Python 3.x
- requests
- beautifulsoup4
- csv / json (Python standard library)
---
## π Installation
1. Clone or download this repository.
2. Ensure Python 3 is installed.
3. Install required packages:
```bash
pip install beautifulsoup4 requests
```
4. Run the scraper:
```bash
python Generic_Scraper.py
```
5. Enter the starting URL of the site you wish to scrape when prompted.
---
## π― Data Extracted
- Profile Name
- Location
- Profile Attribute 1
- Profile Attribute 2
- Profile Attribute 3
- Profile Attribute 4
- Service Rate (Local & Remote)
- Deposit Terms
- Cancellation Terms
- Contact Information
*Note: These fields are intentionally generalized for flexibility across multiple use cases.*
---
## π License
Copyright (c) 2025 Couchtr26
MIT License
About
A Python web scraper for extracting profile data from paginated web directories. This tool handles automatic pagination, structured data extraction, and outputs results in both CSV and JSON formats. Ideal for aggregating business directories, vendor listings, and other web-based profile data.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published