The ETL System we use is based on the DGP-APP Platform
- The operators will only run on Linux or WSL!!!
configuration.json
- the main DGP app configuration filesrm_tools/
- a utility python package, for common code and tools not specific for a single operatorevents/
- DGP Event handlers (TBD)operators/
- The specific pipeline operators code
Authentication
EXTERNAL_ADDRESS
: External address of the website (used to set auth callback correctly)GOOGLE_KEY
: Credentials key for oauth2 authenticationGOOGLE_SECRET
: Credentials secret for oauth2 authenticationDGP_APP_DEFAULT_ROLE
: Set to1
, disallowing any anonymous accessPUBLIC_KEY
&PRIVATE_KEY
: PEM encoded RSA key pair, used to encode JWT for the client
Source file storage:
BUCKET_NAME
,AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
,AWS_REGION
,S3_ENDPOINT_URL
: The usual meaning
Databases:
DATABASE_URL
: Connection string for theauth
databaseDATASETS_DATABASE_URL
: Connection string for thedatasets
databaseETLS_DATABASE_URL
: Connection string for theetls
databaseAIRFLOW__DATABASE__SQL_ALCHEMY_CONN
: Connection string for theairflow
database
Scraper Specific:
See .env.example
for a full list of scraper-specific environment variables.
-
This creates a virtual environment named .venv in the current directory.
python -m venv .venv
-
This activates the virtual environment.
. .venv/Scripts/activate
-
This installs all the dependencies listed in the requirements.txt file into the virtual environment.
pip install -r requirements.txt
ReActivate the virtual environment
. .venv/Scripts/activate
ReActivate the virtual environment
.venv/Scripts/python -m operators.test_email_notifier.__init__
example .venv/Scripts/python -m operators.{dir_name}.init
- the dgp-app seems to runs the operator function, not the file.