Basically, it's a text file to WARC pipeline for grab-site (and technically ArchiveBot).
Prototype was coded on Windows and requires Python, 7-Zip & Docker. Untested on other platforms.
- Download and install Docker.
- Grab Dockerfile from Nold360/docker-grab-site and place into a folder in a root directory (e.g.
D:\grab-site-data
). This will become the data folder for the docker containers. - Build the image with
docker build -t grab-site .
(Size of docker image is around 500 mb) - Spin the container up with
docker run -d --rm -p29000:29000 -v DATA_FOLDER:/data --name grab-site-container grab-site
(SetDATA_FOLDER
to the path of the above directory) - Create a text file of a bunch of IDs you want the script to archive.
- Open a terminal in this repo directory.
- Run
python . DATA_FOLDER TEXTFILE ITEM_TYPE
(withDATA_FOLDER
being the directory above,TEXTFILE
being the text file andITEM_TYPE
being what type the items in the text file are).