This repository contains a tool for scraping data from one CKAN repository into another. Typically people use CKAN's harvesting extension to do this.
This scraper requires no extra tooling within CKAN itself, but takes a simpler, clumsier approach to data replication using CKAN's API, storing its JSON output locally.
This software requires tools typically found on a Unix system such as
curl
, jq
, make
and perl
.
You will need a CKAN API
token
for the CKAN instance you wish to write to. The user associated with
the token will need access to create organisations and packages
(datasets). Store the token in a file named API_TOKEN
in the same
directory as this README document or in a CKAN_API_TOKEN
environment
variable.
In addition to the API token described above, set the following environment variables:
BASE_URL
: The CKAN API endpoint to read from such ashttp://example.org/api/3/action
WRITE_BASE_URL
: The CKAN API endpoint to write to such ashttp://example.org/api/3/action
Commands are typically run from a Makefile. To run a complete migration
process, run make all
; to run individual steps, run make TARGET
where TARGET takes one or more of the following values:
all
: run the complete migration processfetch
: fetch all datasets and organisationsfetch_datasets
: fetch all datasetsfetch_orgs
: fetch all organisationscreate_datasets
: create all datasets from already fetched datacreate_orgs
: create all organisations from already fetched data
Certainly. This was a quick hack job. Feel free to report and fix any bugs you find.