DEPRECATED - no longer actively maintained

Pharmacy Data ETL

ETL to retrieve Pharmacy information from NHS Organisation > API based on listings in NHS Choices > Syndication and store as JSON.

Run process

In order for the process to access the syndication feed an API key is required. Details of registration are available on NHS Choices. The application needs the API key available within the environment as the variable SYNDICATION_API_KEY.

The output is uploaded to Azure Blob Storage, a suitable connection string should be set in the AZURE_STORAGE_CONNECTION_STRING variable. For further details see Azure Blob Storage.

The ETL retrieves the ODS codes for all Pharmacies from the Syndication API, then visits the organisation API to obtain full pharmacy information. An initial list of ODS codes is retrieved from Azure storage. The most recently created file beginning pharmacy-seed-ids is used as the source of the data. If no file is found the ETL will not run. Once the IDs are loaded, the most recent pharmacy data is retrieved from Azure Blob Storage for the particular version of the ETL.

The ETL version is included along with a datestamp to enable a full rescan if the data structure changes. If no file is found, the entire dataset will be rebuilt.

The modifiedsince end point of Syndication is used to determine any newly added pharmacies, and add any new records to the seed ID file for future scrapes.

All records in the seed ID file will be refreshed from Syndication. If an ID has been deleted, the details will be recorded in the summary file as reporting a 404 error, and the record will not be present in the output JSON.

Once the initial scan is complete, failed pharmacies will be revisited. ODS codes for records still failing after the second attempt are listed in a summary.json file.

If NODE_ENV=production running scripts/start will bring up a docker container and initiate the scrape at a scheduled time, GMT. The default is 11pm. The time of the scrape can be overridden by setting the environment variable ETL_SCHEDULE. e.g. export ETL_SCHEDULE='25 15 * * *' will start the processing at 3:25pm. Note: the container time is GMT and does not take account of daylight saving, you may need to subtract an hour from the time if it is currently BST.

During local development it is useful to run the scrape at any time. This is possible by running node app.js (with the appropriate env vars set).

Further details on node-schedule available here

The scheduler can be completely disabled by setting the DISABLE_SCHEDULER variable to true. This sets the run date to run once in the future on Jan 1st, 2100.

A successful scrape will result in the file pharmacy-data.json being written to the output folder and to the Azure storage location specified in the environmental variables.

The files uploaded to Azure Blob Storage are:

summary-YYYYMMDD-VERSION.json
pharmacy-seed-ids-YYYYMMDD.json
pharmacy-data-YYYYMMDD-VERSION.json
pharmacy-data.json

YYYYMMDD is the current year, month and date. VERSION is the current major & minor version of the ETL as defined in the package.json.

The ETL may also be run locally with yarn start.

The output JSON will be an array of objects in the format shown in the Sample Pharmacy Data

Environment variables

Environment variables are expected to be managed by the environment in which the application is being run. This is best practice as described by twelve-factor.

Variable	Description	Default	Required
`AZURE_STORAGE_CONNECTION_STRING`	Azure storage connection string		yes
`CONTAINER_NAME`	Azure storage container name	etl-output
`DISABLE_SCHEDULER`	set to 'true' to disable the scheduler	false
`ETL_SCHEDULE`	Time of day to run the upgrade. Syntax	0 23 * * * (11:00 pm)
`LOG_LEVEL`	log level	Depends on `NODE_ENV`
`NODE_ENV`	node environment	development
`SYNDICATION_API_KEY`	API key to access syndication		yes

Architecture Decision Records

This repo uses Architecture Decision Records to record architectural decisions for this project. They are stored in doc/adr.

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
app/lib		app/lib
doc/adr		doc/adr
output		output
rancher-config		rancher-config
scripts		scripts
test		test
.dockerignore		.dockerignore
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.gitmodules		.gitmodules
.istanbul.yml		.istanbul.yml
.snyk		.snyk
.travis.yml		.travis.yml
AUTHORS		AUTHORS
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.js		app.js
docker-compose-tests.yml		docker-compose-tests.yml
docker-compose.yml		docker-compose.yml
package.json		package.json
sample-pharmacy-data.json		sample-pharmacy-data.json
schedule.js		schedule.js
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEPRECATED - no longer actively maintained

Pharmacy Data ETL

Run process

Environment variables

Architecture Decision Records

About

Releases 11

Packages

Contributors 5

Languages

License

nhsuk/pharmacy-data-etl

Folders and files

Latest commit

History

Repository files navigation

DEPRECATED - no longer actively maintained

Pharmacy Data ETL

Run process

Environment variables

Architecture Decision Records

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 5

Languages

Packages