This repository provides a command line interface (CLI) utility that replicates an Amazon Managed Workflows for Apache Airflow (MWAA) environment locally.
The CLI builds a Docker container image locally that’s similar to a MWAA production image. This allows you to run a local Apache Airflow environment to develop and test DAGs, custom plugins, and dependencies before deploying to MWAA.
dags/
example_dag_with_custom_ssh_plugin.py
example_dag_with_taskflow_api.py
requirements.txt
tutorial.py
docker/
config/
airflow.cfg
constraints.txt
mwaa-base-providers-requirements.txt
requirements.txt
webserver_config.py
.env.localrunner
script/
bootstrap.sh
entrypoint.sh
systemlibs.sh
generate_key.sh
docker-compose-local.yml
docker-compose-resetdb.yml
docker-compose-sequential.yml
Dockerfile
plugins/
ssh_plugin.py
.gitignore
CODE_OF_CONDUCT.md
CONTRIBUTING.md
LICENSE
mwaa-local-env
README.md
VERSION
- macOS: Install Docker Desktop.
- Linux/Ubuntu: Install Docker Compose and Install Docker Engine.
- Windows: Windows Subsystem for Linux (WSL) to run the bash based command
mwaa-local-env
. Please follow Windows Subsystem for Linux Installation (WSL) and Using Docker in WSL 2, to get started.
- make sure to clone outside of HDW repo
git clone https://github.com/aws/aws-mwaa-local-runner.git
# after doing ls on command line, you should see below output
ls
- harrys-data-warehouse
- aws-mwaa-local-runner
cd aws-mwaa-local-runner
- select dev profile after running below command
aws sso login
Build the Docker container image using the following command:
./mwaa-local-env build-image
Note: it takes several minutes to build the Docker image locally.
Run Apache Airflow using one of the following database backends.
Runs a local Apache Airflow environment that is a close representation of MWAA by configuration.
./mwaa-local-env start
To stop the local environment, Ctrl+C on the terminal and wait till the local runner and the postgres containers are stopped. Also please run below command to remove any orphan containers or networks (after hitting Ctrl+C on the terminal)
./mwaa-local-env stop
By default, the bootstrap.sh
script creates a username and password for your local Airflow environment.
- Username:
admin
- Password:
test
- Open the Apache Airlfow UI: http://localhost:8085/.
Step four: Add DAGs and supporting files - you can skip this step, since we point to HDW repo using docker compose
The following section describes where to add your DAG code and supporting files. We recommend creating a directory structure similar to your MWAA environment.
- Add DAG code to the
dags/
folder. - To run the sample code in this repository, see the
tutorial.py
file.
- Add Python dependencies to
dags/requirements.txt
. - To test a requirements.txt without running Apache Airflow, use the following script:
./mwaa-local-env test-requirements
Let's say you add aws-batch==0.6
to your dags/requirements.txt
file. You should see an output similar to:
Installing requirements.txt
Collecting aws-batch (from -r /usr/local/airflow/dags/requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/5d/11/3aedc6e150d2df6f3d422d7107ac9eba5b50261cf57ab813bb00d8299a34/aws_batch-0.6.tar.gz
Collecting awscli (from aws-batch->-r /usr/local/airflow/dags/requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/07/4a/d054884c2ef4eb3c237e1f4007d3ece5c46e286e4258288f0116724af009/awscli-1.19.21-py2.py3-none-any.whl (3.6MB)
100% |████████████████████████████████| 3.6MB 365kB/s
...
...
...
Installing collected packages: botocore, docutils, pyasn1, rsa, awscli, aws-batch
Running setup.py install for aws-batch ... done
Successfully installed aws-batch-0.6 awscli-1.19.21 botocore-1.20.21 docutils-0.15.2 pyasn1-0.4.8 rsa-4.7.2
- There is a directory at the root of this repository called plugins. It contains a sample plugin
ssh_plugin.py
- In this directory, create a file for your new custom plugin. For example:
ssh_plugin.py
- (Optional) Add any Python dependencies to
dags/requirements.txt
.
Note: this step assumes you have a DAG that corresponds to the custom plugin. For examples, see MWAA Code Examples.
- Learn how to upload the requirements.txt file to your Amazon S3 bucket in Installing Python dependencies.
- Learn how to upload the DAG code to the dags folder in your Amazon S3 bucket in Adding or updating DAGs.
- Learn more about how to upload the plugins.zip file to your Amazon S3 bucket in Installing custom plugins.
The following section contains common questions and answers you may encounter when using your Docker container image.
- You can setup the local Airflow's boto with the intended execution role to test your DAGs with AWS operators before uploading to your Amazon S3 bucket. To setup aws connection for Airflow locally see Airflow | AWS Connection To learn more, see Amazon MWAA Execution Role.
- You can set AWS credentials via environment variables set in the
docker/config/.env.localrunner
env file. To learn more about AWS environment variables, see Environment variables to configure the AWS CLI and Using temporary security credentials with the AWS CLI. Simply set the relevant environment variables in.env.localrunner
and./mwaa-local-env start
.
- A
requirements.txt
file is included in the/dags
folder of your local Docker container image. We recommend adding libraries to this file, and running locally.
- If a library is not available in the Python Package Index (PyPi.org), add the
--index-url
flag to the package in yourdags/requirements.txt
file. To learn more, see Managing Python dependencies in requirements.txt.
The following section contains errors you may encounter when using the Docker container image in this repository.
- If you encountered the following error:
process fails with "dag_stats_table already exists"
, you'll need to reset your database using the following command: - if you see error: Fernet Key InvalidToken follow below commands to reset
- please delete aws-mwaa-local-runner/db-data folder manually and run below commands
./mwaa-local-env reset-db
./mwaa-local-env build-image
./mwaa-local-env start
./mwaa-local-env stop #use this command if you are done working with local airflow
- if you see error:
An error occurred (ExpiredToken) when calling the ListBuckets operation: The provided token has expired. please do following
for any tasks which uses aws services, it means that token has expired and we need to follow below commands
aws sso login
`select dev profile and then run below commands`
./mwaa-local-env build-image
./mwaa-local-env start
./mwaa-local-env stop #use this command if you are done working with local airflow
A Fernet Key is generated during image build (./mwaa-local-env build-image
) and is durable throughout all
containers started from that image. This key is used to encrypt connection passwords in the Airflow DB.
If changes are made to the image and it is rebuilt, you may get a new key that will not match the key used when
the Airflow DB was initialized, in this case you will need to reset the DB (./mwaa-local-env reset-db
).
See CONTRIBUTING for more information.
Something went wrong in setting up your credentials. Please run the following to reset your database:
rm -r ./db-data
./mwaa-local-env reset-db
You can now proceed normally the steps above.
This library is licensed under the MIT-0 License. See the LICENSE file.