Skip to content

A versatile Python-based web scraper that extracts content from single URLs or entire sitemaps, organizing data into structured text files. Features include sitemap parsing, content grouping by URL structure, and an easy-to-use command-line interface. Ideal for data extraction, content analysis, and web research tasks.

Notifications You must be signed in to change notification settings

danhilse/web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraper

Python License Last Commit

A powerful command-line web scraper tool that extracts content from websites and saves it to organized text files.

Web Scraper Demo

Features

  • Scrape content from a single URL or an entire sitemap
  • Group scraped content into separate files based on URL structure
  • Output content to multiple text files, organized by website sections
  • Executable file for easy use without Python installation

Installation

  1. Clone this repository: git clone https://github.com/yourusername/web-scraper.git

  2. Install the required dependencies: pip install -r requirements.txt

Usage

To scrape a single URL: python web_scraper.py https://example.com

To scrape an entire sitemap: python web_scraper.py https://example.com --sitemap

Project Structure

  • web_scraper.py: Main script containing the web scraper logic
  • requirements.txt: List of Python dependencies

Executable

A pre-built executable is available in the dist folder. You can download and run it directly without needing to install Python or any dependencies.

License

This project is open source and available under the MIT License.

About

A versatile Python-based web scraper that extracts content from single URLs or entire sitemaps, organizing data into structured text files. Features include sitemap parsing, content grouping by URL structure, and an easy-to-use command-line interface. Ideal for data extraction, content analysis, and web research tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages