A simple data pipeline with 3 tasks where the tasks are executed both sequentially and in parallel. The tasks are glue pythonshell jobs and they are orchestrated using step functions.
The definition of the datajob can be found in datajob_stack.py
git clone git@github.com:vincentclaes/datajob.git
cd datajob
pip install poetry --upgrade
poetry shell
poetry install
cd examples/data_pipeline_simple
export AWS_PROFILE=default
export AWS_DEFAULT_REGION=eu-west-1
export AWS_ACCOUNT=$(aws sts get-caller-identity --query Account --output text --profile $AWS_PROFILE)
cdk bootstrap aws://$AWS_ACCOUNT/$AWS_DEFAULT_REGION
cdk deploy --app "python datajob_stack.py" --require-approval never
✅ data-pipeline-simple
Stack ARN:
datajob execute --state-machine data-pipeline-simple-workflow
[16:28:56] executing: data-pipeline-simple-workflow
[16:28:58] status: RUNNING
view the execution on the AWS console:
If you click the link, you can follow up on the progress
cdk destroy --app "python datajob_stack.py"