Skip to content
This repository has been archived by the owner on Oct 26, 2023. It is now read-only.
/ etl-toolkit Public archive

ETL toolkit - general components for managing the retrieval and processing of data

License

Notifications You must be signed in to change notification settings

nhsuk/etl-toolkit

Repository files navigation

ETL toolkit

General components for managing the retrieval and processing of data

GitHub Release Greenkeeper badge Build Status Coverage Status Known Vulnerabilities

Components

Queues

populateIds may be used to add IDs to the etlStore from a paged source.

populateRecordsFromIds may be used to populate records from the IDs in the etlStore.

ETL Store

The etlStore manages the state of the ETL including IDs, loaded records, and a list of errored IDs. The store can persist state to the local file system during queue processing enabling an ETL to continue after interruption.

Environment variables

Environment variables are expected to be managed by the environment in which the application is being run. This is best practice as described by twelve-factor.

Environment variables are used to set application level settings for each environment.

Variable Description Default Required
LOG_LEVEL log level Depends on NODE_ENV
NODE_ENV node environment development
OUTPUT_FILE Filename saved to azure etl-data
HITS_PER_HOUR Maximum number of times to call a queue operation per hour 20000
ETL_NAME Name used in Bunyan logger etl-toolkit