ccan-archiver
is a command-line program to archive the content of ccan.de and the Clonk-Center Archive.
The focus of this program is not archiving entire sites, but only downloadable items (excluding comments, the forum etc.) and some metadata.
The program directly downloads to a zip file called result.zip
. In this file you can find a README.md
file that documents the structure.
Note: The file will be about 6.5GB in size (last checked 25.12.2018).
This program mainly depends on the go standard library. The only other package needed is golang.org/x/net/html
.
You can install it with go get golang.org/x/net/html
or go get ./...
.
You can download binaries from the releases section of this repository.
To compile this program, you need to have go
and go-bindata
installed.
Note: This was tested with go version go1.10.3 windows/amd64
and go-bindata 3.1.0
, but it should work on any platform supported by go.
First, you have to generate some assets using go generate:
go generate zipfactory/*.go
After this, you can build the executable:
go build
This is a brief documentation of the content of the resulting file called result.zip
, you can find more in the file README.md
in the archive.
The files are in the following schema:
site/username/name.ext
site
: The site name where the item was hosted. This is either "CCAN" or "Clonk-Center"username
: username of the user that uploaded the file to CCAN.dename.ext
: File to download
There are also metadata files according to this schema:
site/username/name.ext.json
Exception: README.md
and failed.json
(this file only exists if a download failed, you can find all metadata there)
The json file for all "CCAN" items contains the following entry:
{
"name": Display name of the object,
"date": Upload date,
"download_count": Number of downloads,
"author": Name of the uploader,
"votes": Number of votes,
"category": Category,
"engine": Engine for which this file was created,
"download_link": download link to ccan.de (Usually a redirect)
}
The json file for all "Clonk-Center" items contains the following entry:
{
"name": Display name of the object,
"date": Upload date,
"download_count": Number of downloads,
"author": Name of the uploader,
"engine": Engine for which this file was created,
"download_link": download link to cc-archive.lwrl.de,
"description": The description that can be found at the page for this item (as Markdown text)
}
Read more in the README.md
at the root of your archive after it has been downloaded.
MIT, see LICENSE