Skip to content

Commit

Permalink
Version-1.0-beta public release
Browse files Browse the repository at this point in the history
  • Loading branch information
ibnesayeed committed Sep 8, 2015
1 parent 7fd932f commit 6fa2e03
Show file tree
Hide file tree
Showing 6 changed files with 2,138 additions and 2 deletions.
8 changes: 8 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM golang
MAINTAINER Sawood Alam <ibnesayeed@gmail.com>

COPY . /go/src/github.com/oduwsdl/memgator
WORKDIR /go/src/github.com/oduwsdl/memgator
RUN go install -v

ENTRYPOINT ["memgator"]
101 changes: 99 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,99 @@
# memgator
A Memento Aggregator CLI and Server in Go
# MemGator

A Memento Aggregator CLI and Server in [Go](https://golang.org/).

## Features

* The binary (available for various platforms) can be used as the CLI or run as a Web Service
* Results available in three formats - Link/JSON/CDXJ
* TimeMap, TimeGate, TimeNave, and Redirect endpoints
* Good API parity with the [main Memento Aggregator service](http://timetravel.mementoweb.org/guide/api/)
* Concurrent - Splits every session in subtasks for parallel execution
* Parallel - Utilizes all the available CPUs
* Custom archive list (a local JSON file or a remote URL) - a sample JSON is included in the repository
* Probability based archive prioritization and limit
* Three levels of customizable timeouts for greater control over remote requests
* Customizable logging and profiling in CDXJ format
* Customizable endpoint URLs - helpful in load-balancing
* Customizable User-Agent to be sent to each archive
* [CORS](http://www.w3.org/TR/cors/) support to make it easy to use it from JavaScript clients
* Memento count exposed in the header that can be retrieved via `HEAD` request
* [Docker](https://www.docker.com/) friendly - An image available as [ibnesayeed/memgator](https://hub.docker.com/r/ibnesayeed/memgator/)
* Sensible defaults - Batteries included, but replaceable

## Usage

### CLI

Command line interface of MemGator allows retrieval of TimeMap and TimeGate over `STDOUT` in all supported formats. Info/Profiling (in verbose mode) and Error output is available on `STDERR` unless appropriate files are configured. For further details, see the full usage.

```
$ memgator [options] {URI-R} # TimeMap CLI
$ memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # TimeGate CLI
```

### Server

When run as a Web Service, MemGator exposes three customizable endpoints as follows:

```
$ memgator [options] server
TimeMap : http://localhost:1208/timemap/timemap/link|json|cdxj/{URI-R}
TimeGate : http://localhost:1208/timegate/timegate/{URI-R} [Accept-Datetime]
TimeNav : http://localhost:1208/timenav/link|json|cdxj/{YYYY[MM[DD[hh[mm[ss]]]]]}/{URI-R}
Redirect : http://localhost:1208/redirect/{YYYY[MM[DD[hh[mm[ss]]]]]}/{URI-R}
```

The `TimeMap` and `TimeGate` responses are in accordance with the [Memento RFC](http://tools.ietf.org/html/rfc7089). Additionally, the TimeMap endpoint also supports some additional serialization formats. The `TimeNav` service is a URL friendly way to expose the same information in the response body (in various formats) as available in the `Link` header of the `TimeGate` response without the need of a header based time negotiation. The `Redirect` service resolves the datetime (full or partial) passed in the URL and redirects to the closest Memento.

## Download and Install

Depending on the machine and operating system download appropriate binary from the releases page. Changed the mode of the file to executable `chmod +x MemGator-BINARY`. Run from the current location of the downloaded binary or rename it to `memgator` and move it into a directory that is in the `PATH` (such as `/usr/local/bin/`).

## Running as a Docker Container

```
$ docker run ibnesayeed/memgator -h
$ docker run ibnesayeed/memgator [options] {URI-R}
$ docker run ibnesayeed/memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]}
$ docker run ibnesayeed/memgator [options] server
```

### Full Usage

```
Usage:
memgator [options] {URI-R} # TimeMap CLI
memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # TimeGate CLI
memgator [options] server # Run as a Web Service
Options:
-A, --agent=MemGator:1.0-beta <{CONTACT}> User-agent string sent to archives
-a, --arcs=http://www.cs.odu.edu/~salam/archives.json Local/remote JSON file path/URL for list of archives
-c, --contact=@WebSciDL Email/URL/Twitter handle - Used in the user-agent
-f, --format=Link Output format - Link/JSON/CDXJ
-g, --timegate=http://{SERVICE}/timegate TimeGate base URL - default based on service URL
-H, --host=localhost Host name - only used in web service mode
-k, --topk=-1 Aggregate only top k archives based on probability
-l, --log= Log file location - Defaults to STDERR
-m, --timemap=http://{SERVICE}/timemap TimeMap base URL - default based on service URL
-P, --profile= Profile file location - Defaults to Logfile
-p, --port=1208 Port number - only used in web service mode
-r, --restimeout=20s Response timeout for each archive
-s, --service=http://{HOST}[:{PORT}] Service base URL - default based on host & port
-T, --hdrtimeout=15s Header timeout for each archive
-t, --contimeout=5s Connection timeout for each archive
-V, --verbose=false Show Info and Profiling messages on STDERR
-v, --version=false Show name and version
```

## Build

Assuming that Git and Go are installed, the `GOPATH` environment variable is set to the Go Workspace directory as described in the [How to Write Go Code](https://golang.org/doc/code.html), and `PATH` includes `$GOPATH/bin`. Cloning, building, and running the code can be done using following commands:

```
$ cd $GOPATH
$ go get github.com/oduwsdl/memgator
$ go install github.com/oduwsdl/memgator
$ memgator http://example.com/
```
86 changes: 86 additions & 0 deletions archives.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
[
{
"id": "ia",
"name": "Internet Archive",
"timemap": "http://web.archive.org/web/timemap/link/",
"timegate": "http://web.archive.org/web/",
"probability": 0.5
},
{
"id": "proni",
"name": "PRONI Web Archive",
"timemap": "http://webarchive.proni.gov.uk/timemap/",
"timegate": "http://webarchive.proni.gov.uk/timegate/",
"probability": 0.001
},
{
"id": "pastpages",
"name": "PastPages Web Archive",
"timemap": "http://www.pastpages.org/timemap/link/",
"timegate": "http://www.pastpages.org/timegate/",
"probability": 0.001
},
{
"id": "ba",
"name": "Bibliotheca Alexandrina Web Archive",
"timemap": "http://web.archive.bibalex.org/web/timemap/link/",
"timegate": "http://web.archive.bibalex.org/web/",
"probability": 0.0,
"ignore": true
},
{
"id": "blarchive",
"name": "UK Web Archive",
"timemap": "http://www.webarchive.org.uk/wayback/archive/timemap/link/",
"timegate": "http://www.webarchive.org.uk/wayback/archive/",
"probability": 0.2
},
{
"id": "loc",
"name": "Library of Congress",
"timemap": "http://webarchive.loc.gov/all/timemap/link/",
"timegate": "http://webarchive.loc.gov/all/",
"probability": 0.05
},
{
"id": "archiveit",
"name": "Archive-It",
"timemap": "http://wayback.archive-it.org/all/timemap/link/",
"timegate": "http://wayback.archive-it.org/all/",
"probability": 0.1
},
{
"id": "ukparliament",
"name": "UK Parliament Web Archive",
"timemap": "http://webarchive.parliament.uk/timemap/",
"timegate": "http://webarchive.parliament.uk/timegate/"
},
{
"id": "uknationalarchives",
"name": "UK National Archives Web Archive",
"timemap": "http://webarchive.nationalarchives.gov.uk/timemap/",
"timegate": "http://webarchive.nationalarchives.gov.uk/timegate/",
"probability": 0.04
},
{
"id": "archive.is",
"name": "archive.today",
"timemap": "http://archive.today/timemap/",
"timegate": "http://archive.today/timegate/",
"probability": 0.07
},
{
"id": "is",
"name": "Icelandic Web Archive",
"timemap": "http://wayback.vefsafn.is/wayback/timemap/link/",
"timegate": "http://wayback.vefsafn.is/wayback/",
"probability": 0.06
},
{
"id": "swa",
"name": "Stanford Web Archive",
"timemap": "https://swap.stanford.edu/timemap/link/",
"timegate": "https://swap.stanford.edu/",
"probability": 0.001
}
]
Loading

0 comments on commit 6fa2e03

Please sign in to comment.