Skip to content
Change the repository type filter

All

    Repositories list

    • Zeno

      Public
      State-of-the-art web crawler 🔱
      Go
      45316264Updated Sep 4, 2025Sep 4, 2025
    • gowarc

      Public
      Read and write WARC files in Go
      Go
      634122Updated Sep 4, 2025Sep 4, 2025
    • TypeScript
      0001Updated Sep 4, 2025Sep 4, 2025
    • TypeScript
      18221Updated Sep 4, 2025Sep 4, 2025
    • TypeScript
      2602Updated Sep 3, 2025Sep 3, 2025
    • Internet Archive histogram-date-range picker
      TypeScript
      0102Updated Sep 3, 2025Sep 3, 2025
    • One webpage for every book ever published!
      Python
      1.6k5.9k782127Updated Sep 3, 2025Sep 3, 2025
    • HTML
      2710Updated Sep 3, 2025Sep 3, 2025
    • iaux

      Public
      Monorepo for Archive.org UX development and prototyping.
      TypeScript
      877189150Updated Sep 3, 2025Sep 3, 2025
    • The Internet Archive BookReader
      JavaScript
      4431.1k131104Updated Sep 2, 2025Sep 2, 2025
    • A repository of cleanup bots implementing the openlibrary-client
      Python
      5573279Updated Sep 1, 2025Sep 1, 2025
    • Google Summer of Code (GSoC) 2025 Wayback Machine Seed URL Classification and Prioritization project
      Python
      0100Updated Sep 1, 2025Sep 1, 2025
    • TypeScript
      01111Updated Sep 1, 2025Sep 1, 2025
    • PHP
      3414001Updated Sep 1, 2025Sep 1, 2025
    • displays notifications and automatically clears them
      TypeScript
      01112Updated Sep 1, 2025Sep 1, 2025
    • IAUX Typescript WebComponent Template
      JavaScript
      31038Updated Aug 31, 2025Aug 31, 2025
    • Web component for displaying and editing Internet Archive reviews
      TypeScript
      0116Updated Aug 31, 2025Aug 31, 2025
    • infogami

      Public
      Python
      454794Updated Aug 29, 2025Aug 29, 2025
    • heritrix3

      Public
      Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
      Java
      7693k314Updated Aug 29, 2025Aug 29, 2025
    • iiif

      Public
      The official Internet Archive IIIF service
      JavaScript
      624160Updated Aug 28, 2025Aug 28, 2025
    • brozzler

      Public
      brozzler - distributed browser-based web crawler
      Python
      1067383615Updated Aug 28, 2025Aug 28, 2025
    • Fast PDF generation and compression. Deals with millions of pages daily.
      Python
      16122351Updated Aug 27, 2025Aug 27, 2025
    • TypeScript
      0000Updated Aug 26, 2025Aug 26, 2025
    • rclone

      Public
      [vault fork] of "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files
      Go
      4.7k400Updated Aug 26, 2025Aug 26, 2025
    • Python script to create CDX index files of WARC data
      Arc
      152042Updated Aug 26, 2025Aug 26, 2025
    • nomad

      Public
      CI/CD code to manage and deploy to Nomad clusters. CI/CD uses a GitHub Actions reusable workflow; deploy phase sends just built containers to a nomad cluster. Contains helpful aliases for devs, including "hot sync" of code into deploys
      Shell
      2700Updated Aug 25, 2025Aug 25, 2025
    • A Modal Manager WebComponent
      TypeScript
      13114Updated Aug 25, 2025Aug 25, 2025
    • tracey

      Public
      Tracey Jaquith, Internet Archive 🏛️, talks and slides
      HTML
      0200Updated Aug 21, 2025Aug 21, 2025
    • Kotlin
      51920Updated Aug 21, 2025Aug 21, 2025
    • warcprox

      Public
      WARC writing MITM HTTP/S proxy
      Python
      62418193Updated Aug 20, 2025Aug 20, 2025