An almost generic web crawler built using Scrapy and Python 3.7 to recursively crawl entire websites.
-
Updated
Mar 1, 2022 - Python
An almost generic web crawler built using Scrapy and Python 3.7 to recursively crawl entire websites.
Web Crawler is a Node.js application that allows you to crawl web pages, save them locally, and extract hyperlinks from the page body. It provides a simple command-line interface where you can enter the starting URL and specify the maximum number of crawls. The crawler follows the hyperlinks recursively, saves the web pages in a specified directory
A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖
Add a description, image, and links to the recursive-crawling topic page so that developers can more easily learn about it.
To associate your repository with the recursive-crawling topic, visit your repo's landing page and select "manage topics."