A fast, asset-complete website cloner built with Node.js, Puppeteer, Cheerio, and Axios. Crawls a website, downloads all HTML, CSS, JS, fonts, images, and media to a local folder. Suitable for archiving or offline analysis.
- π Asset-complete crawl: CSS, JS, images, fonts, videos, audio, etc.
- π Recursive link following within root domain
- π¨ CSS parsing: Handles
url(),@import, and asset references - π§Ή Query and fragment stripping for clean local files
- π·οΈ Uses Puppeteer (headless Chrome) for reliable page rendering
- β»οΈ Download retry mechanism for network resilience
git clone https://github.com/NeaByteLab/Website-Cloner.git
cd Website-Cloner
npm installnode index.js <website_url> [output_folder]<website_url>: Root URL to clone (e.g.https://example.com)[output_folder]: (Optional) Output directory (default:./output)
Example:
node index.js https://example.com ./my-archiveAll files are saved with original folder structure in the output folder. Querystrings/fragments are stripped from asset references for clean offline usage.
- π Only follows links within the provided root domain.
- βοΈ Ignores mailto links and anchor jumps.
- π― All CSS
url()and@importasset links are also downloaded. - π Minimal error output, retries up to 3 times for assets/pages.
MIT License Β© 2025 NeaByteLab