Register Ingester PSC is a data ingester for the OpenOwnership Register project. It processes bulk data published about People with Significant Control (PSC) published by Companies House in the UK, and ingests records into Elasticsearch. Optionally, it can also publish new records to AWS Kinesis. It uses raw records only, and doesn't do any conversion into the Beneficial Ownership Data Standard (BODS) format.
Install and boot Register.
Configure your environment using the example file:
cp .env.example .env
PSC_STREAM
: AWS Kinesis stream to which to publish new records (optional)PSC_STREAM_API_KEY
: PSC Stream API registration key (optional; only necessary if ingesting via a stream rather than snapshots)
Create the Elasticsearch indexes:
docker compose run ingester-psc create-indexes
Run the tests:
docker compose run ingester-psc test
There are now three options:
- ingest via snapshots by using the helper script
- ingest via snapshots by running the commands step-by-step
- ingest via a stream by running the commands step-by-step (not fully functional)
To ingest the bulk data from a snapshot (published daily):
docker compose run ingester-psc ingest-bulk
Decide on an import ID relating to the data to download, e.g. 2023-10-06
. This is then used in subsequent commands.
Discover snapshots by retrieving the list of snapshots:
docker compose run ingester-psc discover-snapshots 2023_10_06
Ingest snapshots by iterating through the list of files uploaded to the designated prefix with the import ID, and ingest them into Elasticsearch:
docker compose run ingester-psc ingest-snapshots 2023_10_06
Connect to the PSC Stream API, consume any new records, and ingest them into Elasticsearch (PSC_STREAM_API_KEY
must be set):
docker compose run ingester-psc ingest-stream
Or to connect to the PSC Stream API using stream position STREAM_POSITION
(if valid and not too old):
docker compose run ingester-psc ingest-stream <STREAM_POSITION>