webArchive

Crawls websites and saves found URLs to a file.

Usage

Install Node.js and run npm install in ./crawler.

There are 2 required CLI arguments:

And 2 optional CLI arguments:

For example, if you want to crawl example.com and save found URLs to ./test.txt, run the following command:

node ./index.js example.com test.txt

Use Wget: wget --input-file=CHANGE_THIS --warc-file="warc" --force-directories --tries=10