Skip to content

Data processing and aggregation pipelines for debugging and management of Midburn infrastructure.

License

Notifications You must be signed in to change notification settings

Midburn/midburn-dataflows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Midburn DataFlows

Data processing and aggregation pipelines for debugging and management of Midburn infrastructure.

Install midburn-dataflows

See environment.yaml for the required system dependencies.

To get started quickly on any modern OS you can use Miniconda

The following snippet will install Miniconda on recent Linux distributions:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Create and activate the conda environment

conda env create -f environment.yaml
conda activate midburn-dataflows

To update the dependencies of the active conda environment to the latest version (e.g. after a git pull):

conda env update -f environment.yaml

Install the midburn-dataflows Python package

python3 -m pip install -e .

Running flows

Depending on the flows you want to run you will need to authenticate / config some prerequisites:

  • kubectl connected to a midburn-k8s environment with cluster admin permissions

Run all the flows:

for FLOW in $(ls flows); do python3 "flows/${FLOW}"; done

The flows store data as tabular data packages under data/ directory (not committed)

About

Data processing and aggregation pipelines for debugging and management of Midburn infrastructure.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages