-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
7fd932f
commit 6fa2e03
Showing
6 changed files
with
2,138 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
FROM golang | ||
MAINTAINER Sawood Alam <ibnesayeed@gmail.com> | ||
|
||
COPY . /go/src/github.com/oduwsdl/memgator | ||
WORKDIR /go/src/github.com/oduwsdl/memgator | ||
RUN go install -v | ||
|
||
ENTRYPOINT ["memgator"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,99 @@ | ||
# memgator | ||
A Memento Aggregator CLI and Server in Go | ||
# MemGator | ||
|
||
A Memento Aggregator CLI and Server in [Go](https://golang.org/). | ||
|
||
## Features | ||
|
||
* The binary (available for various platforms) can be used as the CLI or run as a Web Service | ||
* Results available in three formats - Link/JSON/CDXJ | ||
* TimeMap, TimeGate, TimeNave, and Redirect endpoints | ||
* Good API parity with the [main Memento Aggregator service](http://timetravel.mementoweb.org/guide/api/) | ||
* Concurrent - Splits every session in subtasks for parallel execution | ||
* Parallel - Utilizes all the available CPUs | ||
* Custom archive list (a local JSON file or a remote URL) - a sample JSON is included in the repository | ||
* Probability based archive prioritization and limit | ||
* Three levels of customizable timeouts for greater control over remote requests | ||
* Customizable logging and profiling in CDXJ format | ||
* Customizable endpoint URLs - helpful in load-balancing | ||
* Customizable User-Agent to be sent to each archive | ||
* [CORS](http://www.w3.org/TR/cors/) support to make it easy to use it from JavaScript clients | ||
* Memento count exposed in the header that can be retrieved via `HEAD` request | ||
* [Docker](https://www.docker.com/) friendly - An image available as [ibnesayeed/memgator](https://hub.docker.com/r/ibnesayeed/memgator/) | ||
* Sensible defaults - Batteries included, but replaceable | ||
|
||
## Usage | ||
|
||
### CLI | ||
|
||
Command line interface of MemGator allows retrieval of TimeMap and TimeGate over `STDOUT` in all supported formats. Info/Profiling (in verbose mode) and Error output is available on `STDERR` unless appropriate files are configured. For further details, see the full usage. | ||
|
||
``` | ||
$ memgator [options] {URI-R} # TimeMap CLI | ||
$ memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # TimeGate CLI | ||
``` | ||
|
||
### Server | ||
|
||
When run as a Web Service, MemGator exposes three customizable endpoints as follows: | ||
|
||
``` | ||
$ memgator [options] server | ||
TimeMap : http://localhost:1208/timemap/timemap/link|json|cdxj/{URI-R} | ||
TimeGate : http://localhost:1208/timegate/timegate/{URI-R} [Accept-Datetime] | ||
TimeNav : http://localhost:1208/timenav/link|json|cdxj/{YYYY[MM[DD[hh[mm[ss]]]]]}/{URI-R} | ||
Redirect : http://localhost:1208/redirect/{YYYY[MM[DD[hh[mm[ss]]]]]}/{URI-R} | ||
``` | ||
|
||
The `TimeMap` and `TimeGate` responses are in accordance with the [Memento RFC](http://tools.ietf.org/html/rfc7089). Additionally, the TimeMap endpoint also supports some additional serialization formats. The `TimeNav` service is a URL friendly way to expose the same information in the response body (in various formats) as available in the `Link` header of the `TimeGate` response without the need of a header based time negotiation. The `Redirect` service resolves the datetime (full or partial) passed in the URL and redirects to the closest Memento. | ||
|
||
## Download and Install | ||
|
||
Depending on the machine and operating system download appropriate binary from the releases page. Changed the mode of the file to executable `chmod +x MemGator-BINARY`. Run from the current location of the downloaded binary or rename it to `memgator` and move it into a directory that is in the `PATH` (such as `/usr/local/bin/`). | ||
|
||
## Running as a Docker Container | ||
|
||
``` | ||
$ docker run ibnesayeed/memgator -h | ||
$ docker run ibnesayeed/memgator [options] {URI-R} | ||
$ docker run ibnesayeed/memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} | ||
$ docker run ibnesayeed/memgator [options] server | ||
``` | ||
|
||
### Full Usage | ||
|
||
``` | ||
Usage: | ||
memgator [options] {URI-R} # TimeMap CLI | ||
memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # TimeGate CLI | ||
memgator [options] server # Run as a Web Service | ||
Options: | ||
-A, --agent=MemGator:1.0-beta <{CONTACT}> User-agent string sent to archives | ||
-a, --arcs=http://www.cs.odu.edu/~salam/archives.json Local/remote JSON file path/URL for list of archives | ||
-c, --contact=@WebSciDL Email/URL/Twitter handle - Used in the user-agent | ||
-f, --format=Link Output format - Link/JSON/CDXJ | ||
-g, --timegate=http://{SERVICE}/timegate TimeGate base URL - default based on service URL | ||
-H, --host=localhost Host name - only used in web service mode | ||
-k, --topk=-1 Aggregate only top k archives based on probability | ||
-l, --log= Log file location - Defaults to STDERR | ||
-m, --timemap=http://{SERVICE}/timemap TimeMap base URL - default based on service URL | ||
-P, --profile= Profile file location - Defaults to Logfile | ||
-p, --port=1208 Port number - only used in web service mode | ||
-r, --restimeout=20s Response timeout for each archive | ||
-s, --service=http://{HOST}[:{PORT}] Service base URL - default based on host & port | ||
-T, --hdrtimeout=15s Header timeout for each archive | ||
-t, --contimeout=5s Connection timeout for each archive | ||
-V, --verbose=false Show Info and Profiling messages on STDERR | ||
-v, --version=false Show name and version | ||
``` | ||
|
||
## Build | ||
|
||
Assuming that Git and Go are installed, the `GOPATH` environment variable is set to the Go Workspace directory as described in the [How to Write Go Code](https://golang.org/doc/code.html), and `PATH` includes `$GOPATH/bin`. Cloning, building, and running the code can be done using following commands: | ||
|
||
``` | ||
$ cd $GOPATH | ||
$ go get github.com/oduwsdl/memgator | ||
$ go install github.com/oduwsdl/memgator | ||
$ memgator http://example.com/ | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
[ | ||
{ | ||
"id": "ia", | ||
"name": "Internet Archive", | ||
"timemap": "http://web.archive.org/web/timemap/link/", | ||
"timegate": "http://web.archive.org/web/", | ||
"probability": 0.5 | ||
}, | ||
{ | ||
"id": "proni", | ||
"name": "PRONI Web Archive", | ||
"timemap": "http://webarchive.proni.gov.uk/timemap/", | ||
"timegate": "http://webarchive.proni.gov.uk/timegate/", | ||
"probability": 0.001 | ||
}, | ||
{ | ||
"id": "pastpages", | ||
"name": "PastPages Web Archive", | ||
"timemap": "http://www.pastpages.org/timemap/link/", | ||
"timegate": "http://www.pastpages.org/timegate/", | ||
"probability": 0.001 | ||
}, | ||
{ | ||
"id": "ba", | ||
"name": "Bibliotheca Alexandrina Web Archive", | ||
"timemap": "http://web.archive.bibalex.org/web/timemap/link/", | ||
"timegate": "http://web.archive.bibalex.org/web/", | ||
"probability": 0.0, | ||
"ignore": true | ||
}, | ||
{ | ||
"id": "blarchive", | ||
"name": "UK Web Archive", | ||
"timemap": "http://www.webarchive.org.uk/wayback/archive/timemap/link/", | ||
"timegate": "http://www.webarchive.org.uk/wayback/archive/", | ||
"probability": 0.2 | ||
}, | ||
{ | ||
"id": "loc", | ||
"name": "Library of Congress", | ||
"timemap": "http://webarchive.loc.gov/all/timemap/link/", | ||
"timegate": "http://webarchive.loc.gov/all/", | ||
"probability": 0.05 | ||
}, | ||
{ | ||
"id": "archiveit", | ||
"name": "Archive-It", | ||
"timemap": "http://wayback.archive-it.org/all/timemap/link/", | ||
"timegate": "http://wayback.archive-it.org/all/", | ||
"probability": 0.1 | ||
}, | ||
{ | ||
"id": "ukparliament", | ||
"name": "UK Parliament Web Archive", | ||
"timemap": "http://webarchive.parliament.uk/timemap/", | ||
"timegate": "http://webarchive.parliament.uk/timegate/" | ||
}, | ||
{ | ||
"id": "uknationalarchives", | ||
"name": "UK National Archives Web Archive", | ||
"timemap": "http://webarchive.nationalarchives.gov.uk/timemap/", | ||
"timegate": "http://webarchive.nationalarchives.gov.uk/timegate/", | ||
"probability": 0.04 | ||
}, | ||
{ | ||
"id": "archive.is", | ||
"name": "archive.today", | ||
"timemap": "http://archive.today/timemap/", | ||
"timegate": "http://archive.today/timegate/", | ||
"probability": 0.07 | ||
}, | ||
{ | ||
"id": "is", | ||
"name": "Icelandic Web Archive", | ||
"timemap": "http://wayback.vefsafn.is/wayback/timemap/link/", | ||
"timegate": "http://wayback.vefsafn.is/wayback/", | ||
"probability": 0.06 | ||
}, | ||
{ | ||
"id": "swa", | ||
"name": "Stanford Web Archive", | ||
"timemap": "https://swap.stanford.edu/timemap/link/", | ||
"timegate": "https://swap.stanford.edu/", | ||
"probability": 0.001 | ||
} | ||
] |
Oops, something went wrong.