-
It provides a configuration driven deployment for Cognite Extraction Pipelines (named
extpipes
in short) -
Support to run it
- from
poetry run
- from
python -m
- from
docker run
- and as GitHub Action
- from
-
templates used for implementation are
cognitedata/transformation-cli
cognitedata/python-extratcion-utils
- using
CogniteConfig
andLoggingConfig
- and extended with custom config sections
- using
- the configuration structure and example expects a CDF Project configured with
cognitedata/inso-cdf-project-cli
-
.dockerignore
(pycache) - logs folder handling (docker volume mount)
- logger.info() or print() or click.echo(click.style(..))
- logger debug support
- compile as EXE (when Python is not available on customer server)
- code-signed exe required for Windows
Follow the initial setup first
- Fill out relevant configurations from
configs
- Fill out/change
extpipes
fromexample-config-extpipesv2.yml
- Fill out/change
- Change
.env_example
to.env
- Fill out
.env
The extpipes-cli deploy
command applies the configuration file settings to your CDF project and creates the necessary CDF Extraction-Pipelines.
By default it is automatically deleting CDF Extraction-Pipelines which are not covered by the given configuration. You can deactivate this with the
--automatic-delete no
parameter- or the
automatic-delete: false
in configuration-file.
The command also is the configured to run used from a GitHub-Action workflow.
➟ extpipes-cli --help
Usage: extpipes-cli [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--cdf-project-name TEXT CDF Project to interact with the CDF API, the
'CDF_PROJECT',environment variable can be used
instead. Required for OAuth2.
--cluster TEXT The CDF cluster where CDF Project is hosted (e.g.
api, europe-west1-1),Provide this or make sure to
set the 'CLCDF_USTER' environment variable.
Default: api
--host TEXT The CDF host where CDF Project is hosted (e.g.
https://api.cognitedata.com),Provide this or make
sure to set the 'CDF_HOST' environment
variable.Default: https://api.cognitedata.com/
--client-id TEXT IdP client ID to interact with the CDF API. Provide
this or make sure to set the 'CDF_CLIENT_ID'
environment variable if you want to authenticate
with OAuth2.
--client-secret TEXT IdP client secret to interact with the CDF API.
Provide this or make sure to set the
'CDF_CLIENT_SECRET' environment variable if you
want to authenticate with OAuth2.
--token-url TEXT IdP token URL to interact with the CDF API. Provide
this or make sure to set the 'CDF_TOKEN_URL'
environment variable if you want to authenticate
with OAuth2.
--scopes TEXT IdP scopes to interact with the CDF API, relevant
for OAuth2 authentication method. The 'CDF_SCOPES'
environment variable can be used instead.
--audience TEXT IdP Audience to interact with the CDF API, relevant
for OAuth2 authentication method. The
'CDF_AUDIENCE' environment variable can be used
instead.
--dotenv-path TEXT Provide a relative or absolute path to an .env file
(for command line usage only)
--debug Print debug information
--dry-run Log only planned CDF API actions while doing
nothing. Defaults to False.
-h, --help Show this message and exit.
Commands:
deploy Deploy a list of Extraction Pipelines from a configuration file
➟ extpipes-cli deploy --help
Usage: extpipes-cli deploy [OPTIONS] [CONFIG_FILE]
Deploy a list of Extraction Pipelines from a configuration file
Options:
--automatic-delete Delete extpipes which are not specified in config-file
-h, --help Show this message and exit.
You must pass a YAML configuration file as an argument when running the program.
(January'23: only one command is supported right now, but the CLI solution can be extended in the future)
All commands share a cognite
and a logger
section in the YAML manifest, which is common to our Cognite Database-Extractor configuration.
The configuration file supports variable-expansion (${EXTPIPES_**}
), which are provided either
- As environment-variables,
- Through an
.env
file (Note: this doesn't overwrite existing environment variables.) - As command-line parameters
Below is an example configuration:
# follows the same parameter structure as the DB extractor configuration
cognite:
host: ${CDF_HOST}
project: ${CDF_PROJECT}
#
# AAD IdP login credentials:
#
idp-authentication:
client-id: ${CDF_CLIENT_ID}
secret: ${CDF_CLIENT_SECRET}
scopes:
- ${CDF_SCOPES}
token_url: ${CDF_TOKEN_URL}
# https://docs.python.org/3/library/logging.config.html#logging-config-dictschema
logging:
version: 1
formatters:
formatter:
# class: "tools.formatter.StackdriverJsonFormatter"
format: "[%(asctime)s] [%(levelname)s] [%(name)s]: %(message)s"
handlers:
file:
class: "logging.FileHandler"
filename: ./logs/deploy-trading.log
formatter: "formatter"
mode: "w"
level: "DEBUG"
console:
class: "logging.StreamHandler"
level: "DEBUG"
formatter: "formatter"
stream: "ext://sys.stderr"
root:
level: "DEBUG"
handlers: [ "console", "file" ]
Details about the environment variables:
HOST
- The URL to your CDF cluster.
- Example:
https://westeurope-1.cognitedata.com
PROJECT
- The CDF project.
CLIENT_ID
- The client ID of the app registration you have created for the CLI.
CLIENT_SECRET
- The client secret you have created for the app registration,
TOKEN_URL = https://login.microsoftonline.com/<tenant id>/oauth2/v2.0/token
- If you're using Azure AD, replace
<tenant id>
with your Azure tenant ID.
- If you're using Azure AD, replace
SCOPES
- Usually:
https://<cluster-name>.cognitedata.com/.default
- Usually:
In addition to the sections described above, the configuration file for deploy
command requires more sections (some of them optional):
Configuration example:
extpipes:
features:
# NOT USED: extpipe-pattern only documentation atm
extpipe-pattern: '{source}:{short-name}:{table-name}:{suffix}'
# The default and recommended value is: true
# to keep the deployment in sync with configuration
# which means non configured extpipes get automatically deleted
automatic-delete: true
# can contain multiple contacts, can be overwritten on pipeline level
default-contacts:
- name: Yours Truly
email: yours.truly@cognite.com
role: admin
send-notification: false
pipelines:
# required
# max 255 char, external-id provided by client
- external-id: src:001:sap:sap_funcloc:continuous
# optional: str, default to external-id
name: src:001:sap:sap_funcloc:continuous
# optional: str
description: describe or defaults to auto-generated description, that it is "deployed through extpipes-cli@v3.0.0"
# optional: str
data-set-external-id: src:001:sap
# optional: "On trigger", "Continuous" or cron expression
schedule: Continuous
# optional: [{},{}]
# defaults to features.default-contacts (if exist)
contacts:
- name: Fizz Buzz
email: fizzbuzz@cognite.com
role: admin
send-notification: true
# optional: str
source: az-func
# optional: {}
metadata:
version: extpipes-cli@v3.1.0
# optional: str max 10000 char
# Documentation text field, supports Markdown for text formatting.
documentation: Documentation which can include Mermaid diagrams?
# optional: str
# Usually user email is expected here, defaults to extpipes + version?
created-by: extpipes-cli@v3.1.0
# optional: [{},{}]
raw-tables:
- db-name: src:001:sap
table-name: sap_funcloc
# optional: {}
extpipe-config:
# str
config: |
nested yaml/json/ini which is simply a string for this config
# optional: str
description: describe the config, or autogenerate?
poetry build
poetry install
poetry update
poetry run extpipes-cli deploy --debug configs/example-config-extpipes.yml
poetry shell
# extpipes-cli is defined in pyproject.toml
extpipes-cli deploy ./configs/example-config-extpipes.yml
.dockerignore
file- volumes for
configs
(to read) andlogs
folder (to write)
docker build -t extpipes-cli:prod --target=production .
# ${PWD} because only absolute paths can be mounted
# poerty project is deplopyed to /opt/extpipes-cli/
docker run --env-file=.env --volume ${PWD}/configs:/configs --volume ${PWD}/logs:/opt/extpipes-cli/logs extpipes-cli:prod deploy /configs/config-deploy-example.yml
Debugging the Docker container with all dev-dependencies and poetry installed
- volumes for
configs
(to read) andlogs
folder (to write) - volumes for
src
(to read/write)
# using the 'development' target of the Dockerfile multi-stages
➟ docker build -t extpipes-cli:dev --target=development .
# start bash in container
➟ docker run --env-file=.env --volume ${PWD}/configs:/configs --volume ${PWD}/logs:/logs --volume ${PWD}/src://opt/extpipes-cli/src -it --entrypoint /bin/bash extpipes-cli:dev
# run project from inside container
> poetry shell
> extpipes-cli --help
> extpipes-cli --dry-run yes deploy /configs/config-deploy-example.yml
# logs are available on your host in mounted '.logs/' folder
# 'src/' changes are mounted to your host ./src folder
jobs:
deploy:
name: Deploy Extraction Pipelines
environment: dev
runs-on: ubuntu-latest
# environment variables
env:
PROJECT: yourcdfproject
CLUSTER: bluefield
IDP_TENANT: abcde-12345
HOST: https://bluefield.cognitedata.com/
- name: Deploy extpipes
# best practice is to use a tagged release (and not '@main')
# find a released tag here: https://github.com/cognitedata/inso-extpipes-cli/releases
uses: cognitedata/inso-expipes-cli@v2.2.1
env:
CLIENT_ID: ${{ secrets.CLIENT_ID }}
CLIENT_SECRET: ${{ secrets.CLIENT_SECRET }}
HOST: ${{ env.HOST }}
PROJECT: ${{ env.PROJECT }}
TOKEN_URL: https://login.microsoftonline.com/${{ env.IDP_TENANT }}/oauth2/v2.0/token
SCOPES: ${{ env.HOST }}.default
# additional parameters for running the action
with:
config_file: ./configs/example-config-extpipes.yml
-
poetry install
-
To run all checks locally - which is typically needed if the GitHub check is failing - e.g. you haven't set up
pre-commit
to run automatically:poetry install && poetry shell
pre-commit install
# Only needed if not installedpre-commit run --all-files
- Remark: with new version change, manually changes required in
- the version on
pyproject.toml
- the version in
src/extpipes/__init__
(used by--version
parameter). - the
action.yml
file
- the version on