Skip to content

Small CLI application to save webpages from a sitemap on the Internet Archive

Notifications You must be signed in to change notification settings

iyaki/web-archiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

web-archiver

Utility to save on the Web Archive pages from a sitemap

Usage

web-archiver <sitemap URI> [<date>]

This programs needs 2 environment variables:

  • WAYBACK_S3_ACCESS_KEY
  • WAYBACK_S3_SECRET_KEY

Keys are obtained from Web Archive S3-Like API

The program will save all the entries present in the sitemap with a lastMod property newer than the provided date.

If no date is provided all entries will be saved

Examples

# Save only URLs with `lastMod` newer than 2024-05-01
web-archiver https://example.com/sitemap.xml 2024-05-01

Only URLs modified since 2024-05-01 will be saved

# Save all the URLs present in the sitemap
web-archiver https://example.com/sitemap.xml

About

Small CLI application to save webpages from a sitemap on the Internet Archive

Resources

Stars

Watchers

Forks