Skip to content

A Python web scraper for extracting profile data from paginated web directories. This tool handles automatic pagination, structured data extraction, and outputs results in both CSV and JSON formats. Ideal for aggregating business directories, vendor listings, and other web-based profile data.

License

Notifications You must be signed in to change notification settings

Couchtr26/Web_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

# Generic_Profile_Scraper

## πŸ”§ Description

A Python-based, generic web scraper designed to extract structured profile data from paginated web directories. Demonstrates a full scraping workflow: pagination handling, HTML parsing, data extraction, and multi-format file output (CSV & JSON).

**Adaptable for:**
- Business directories
- Vendor listings
- Service provider profiles
- CRM data extraction
- General web data aggregation

---

## πŸ’» Features

- Automatically scrapes multi-page directories
- Extracts profile-level data from individual pages
- Outputs results in both CSV and JSON formats
- Written in Python (`requests`, `BeautifulSoup4`, `csv`, `json`)
- Clear data extraction logic for easy customization

---

## πŸ›  Technologies Used

- Python 3.x
- requests
- beautifulsoup4
- csv / json (Python standard library)

---

## πŸš€ Installation

1. Clone or download this repository.
2. Ensure Python 3 is installed.
3. Install required packages:
    ```bash
    pip install beautifulsoup4 requests
    ```
4. Run the scraper:
    ```bash
    python Generic_Scraper.py
    ```
5. Enter the starting URL of the site you wish to scrape when prompted.

---

## 🎯 Data Extracted

- Profile Name
- Location
- Profile Attribute 1
- Profile Attribute 2
- Profile Attribute 3
- Profile Attribute 4
- Service Rate (Local & Remote)
- Deposit Terms
- Cancellation Terms
- Contact Information

*Note: These fields are intentionally generalized for flexibility across multiple use cases.*

---

## πŸ“„ License

Copyright (c) 2025 Couchtr26  
MIT License

About

A Python web scraper for extracting profile data from paginated web directories. This tool handles automatic pagination, structured data extraction, and outputs results in both CSV and JSON formats. Ideal for aggregating business directories, vendor listings, and other web-based profile data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published