Note: This is only a first draft/brainstorm. I will try to organize the list with more useful sections in the future
Feel free to contribute!
- Download utilities
- Backup
- Compression
- Network
- File systems
- File conversion
- Utility Scripts
- Content sharing
- Data curation
- APIs & Online tools
- Hardware / Monitoring
- Data recovery
- Local Media
- Long-term data archiving
- ArchiveBox: The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
- Browsertrix Crawler: Browsertrix Crawler is a simplified (Chrome) browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container
- Collect: A server to collect & archive websites that also supports video downloads
- grab-site: The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
- Heritrix: Extensible, web-scale, archival-quality web crawler
- HTTrack: Download a website from the Internet to a local directory
- wail: Web Archiving Integration Layer: One-Click User Instigated Preservation
- webrecorder: An integrated platform for creating high-fidelity, ISO-compliant web archives in a user-friendly interface, providing access to archived content, and sharing collections
- wikiteam: set of tools for archiving wikis
-
annie: YouTube-DL alternative written in Golang
-
aria2: A lightweight multi-protocol & multi-source command-line download utility
-
CrowLeer: Powerful C++ web crawler based on libcurl
-
curl: Tool and library for transferring data with URL syntax, supporting many protocols
-
Horahora: Video hosting website and video archival manager for Niconico, Bilibili, and YouTube
-
httpie: A tool similar to curl and wget but designed to be user friendly, useful for web scraping with shell scripts but be aware you're adding a dependency by doing so.
-
news-crawl: Cralwer for news feeds based on StromCrawler that prouduces WARC files
-
Plowshare: Command-line tool to manage file-sharing site
-
Rclone: A command line program to sync files and directories to and from various cloud storage providers
-
rsync: An open source utility that provides fast incremental file transfer
-
Suck-It: Recursively visit and download a website's content to your disk (multi-threaded)
-
wget: Utility for non-interactive download of files from the Web
-
wget2: Successor of GNU Wget, works multi-threaded
-
wpull: Wget-compatible web downloader and crawler
-
you-get: Dumb downloader that scrapes the web
-
ytdl-sub: Automate downloading and metadata generation with YouTubeDL
-
yt-dlp: A fork of YT-DLP that behaves better
- BBCSoundDownloader: Bulk downloader for BBC's Sound Effects library http://bbcsfx.acropolis.org.uk/
- ChanThreadWatch: Saves threads from *chan-style boards and checks for updates until the thread dies
- comics-downloader: Command-line tool to download comicsand manga in pdf/epub/cbz/cbr from supported sites
- floatplane_ripper: Script to rip all videos from https://floatplane.rip/
- gallery-dl: Download image galleries and collections from pixiv, exhentai, danbooru and more
- Discord-Channel-Scraper: Discord server archival (json output, download attachments and emojies)
- dzi-dl: Deep Zoom Image Downloader
- FanFicFare: Tool for making eBooks from stories on fanfiction and other web sites
FicSave: Online fanfiction downloaderSource code is available, website however is now offline.- flickr_download: Simple script to download a Flickr set
- Google Images Download: Python script for downloading images
- iiif-dl: Command-line tile downloader/assembler for IIIF endpoints/manifests
- imgbrd-grabber: Very customizable imageboard/booru downloader with powerful filenaming features
- instaloader: Download pictures (or videos) along with their captions and other metadata from Instagram
- InstaLooter: API-less Instagram pictures and videos downloader.
- Instagram Scraper: Instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user's photos and videos. Use responsibly.
- PyInstaLive: Instagram live stream downloader
- RedditDownloader: Scrapes Reddit to download media of your choice
- Scribd-Downloader: Allows downloading of Scribd documents
- snscrape: A social networking service scraper in Python
- RipMe: RipMe is an album ripper for various websites. Runs on your computer. Requires Java 8.
- Tube Archivist: Self-Hosted Docker container for automated/scheduled YouTube downloads of channels, playlists, etc.
- tumblr-utils: Utilities for dealing with Tumblr blogs, Tumblr backup
- yt-mango: YouTube metadata archiver the Web (HTTP & FTP)
- Youtube-MA: YouTube metadata archiver
- bazarr: Companion application to Sonarr and Radarr for downloading subtitles
- FlexGet: Multipurpose automation tool for content like torrents, nzbs, podcasts, comics, series, movies, etc.
- Jackett: API support for torrent trackers (works with Sonarr, Radarr and others)
- Lidarr: Music collection manager for Usenet and BitTorrent users
- Mylar: An automated Comic Book downloader (cbr/cbz) for use with SABnzbd, NZBGet and torrents
- Sick-Beard: PVR for newsgroup users (with limited torrent support)
- Radarr: A fork of Sonarr to work with movies à la Couchpotato
- Sonarr: PVR for Usenet and BitTorrent users
- BorgBackup: Deduplicating archiver with compression and encryption
- 7-Zip: A file archiver with a high compression ratio
- KGB Archiver: compression tool with unbelievable high compression rate
- peazip: File archiver utility
- PIGZ: Multi-threaded gzip
- WinRAR: Can decompress RAR and zip files
- NetLimiter: Internet traffic control and monitoring tool for Windows
- httpdirfs: A filesystem which allows you to mount HTTP directory listings
- mergerfs: a featureful union filesystem
- NTFS drivers for MacOS
- AAXtoMP3: convert AAX files to common MP3, M4A, M4B, flac and ogg formats through a basic bash script frontend to FFMPEG
- html2warc: Convert web resources to a single warc file
- warcat: Tool and library for handling Web ARChive (WARC) files
- Backblaze B2 sync backup script: Script to sync mutliple directories with Backblaze B2
- flac2mp3_V0.py : Multi-threaded python script to convert all flac files to mp3 V0 while keeping the directory structure
- Misc download scripts: Scripts for downloading content from various websites
- TheFrenchGhosty's Ultimate YouTube-DL Scripts Collection: Collection of YouTube-dl scripts to aid in YouTube channel archival
- rclone_dirsize: Get size of http directory listing with rclone
- rm_empty_subdir: Remove empty sub-directories on Windows
- void-cat-uploader: This script automatically uploads all files inside a directory to https://void.cat
- youtube-dl_soundcloud: Snippet for using YouTube-dl to download soundcloud playlists
- h5ai: HTTP web server index for Apache httpd, lighttpd, nginx and Cherokee
- ipfs: Protocol and network designed to create a content-addressable, peer-to-peer method of storing and sharing hypermedia in a distributed file system
- opds: Easy to use, Open & Decentralized Content Distribution
- Syncthing: An application that lets you synchronize your files across multiple devices
- baobab: Graphical disk usage analyzer
- beets: Music library manager and MusicBrainz tagger
- browsemonkey: Takes snapshots of file systems for offline browsing and searching.
- Calibre: Ebook manager
- DataCurator-Filetree: A unified filetree for all kinds of data, which should help in storing, categorising and retrieving
- DeepSort: AI powered image tagger backed by DeepDetect
- diskover: File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch
- Everything: Locate files and folders by name instantly (Windows)
- FileBot: FileBot is the ultimate tool for organizing and renaming your Movies, TV Shows and Anime
- fucking-weeb: A library manager for animu (and TV shows, and whatever).
- grepWin: A powerful and fast search tool using regular expressions (Windows)
- Hydrus: A desktop application for large media collections
- Kiwix: An offline reader for online content like Wikipedia, Project Gutenberg, or TED Talks
- jdupes: Powerful duplicate file finder
- MediaElch: Media manager for Kodi
- MediaInfo: Convenient unified display of the most relevant technical and tag data for video and audio files
- Mp3tag: Powerful and easy-to-use tool to edit metadata of audio files (Windows/Mac)
- phockup: Media sorting tool to organize photos and videos from your camera
- picard: MusicBrainz tagger
- TeraCopy: Copy your files faster and more securely
- tree: 'tree' command for linux
- WinDirStat: Disk usage statistics viewer and cleanup tool for Windows
- WizTree: Finds the files and folders using the most disk space on your hard drive
- sist2: Lightning-fast file system indexer and search tool
- SyncToy: Microsoft windows file parity across locations tool
- VisiPics: Automatically finds duplicated images
- CrystalDiskInfo: A HDD/SSD utility software which supports a part of USB, Intel RAID and NVMe
- GSmartControl: Easy to use Multi-OS S.M.A.R.T. utility with an easy to understand graphical interface
- Hard Drive Sentinel: Multi-OS SSD and HDD monitoring and analysis software
- smartmontools: Control and monitor storage systems using the (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks
- PhotoRec FOSS powerful gui data recovery tool
- TestDisk Another FOSS tool by the author of PhotoRec, but this one is for cli
- whipper: Python CD-DA ripper preferring accuracy over speed. Generates .flac, .cue, and .log by default and automatically fetches metadata from musicbrainz. EAC log plugin is available.
- Exact Audio Copy: A freeware, Windows only application similar to the above that doesn't automatically fetch metadata by default, but EAC rips are preferred by most trackers
- MakeMKV: A cross-platform DVD ripper that supports recent blu ray DVDs. It's mostly open source, but the blu ray secret sauce is still hidden
- Handbrake: Open source DVD ripper and media trascoder. Has more options and features than the above, but it cannot rip blu ray discs
- CommonCrawl: Data collected over seven years (ongoing) which contains web page data, extracted metadata and text extractions.
- Blockyarchive: Archive with forward error correction and sector level recoverability
- par2cmdline: A PAR 2.0 compatible file verification and repair tool