Skip to content

Commit

Permalink
Swap Old Cloud Composer Infra and Version Refs (#3277)
Browse files Browse the repository at this point in the history
  • Loading branch information
SorenSpicknall authored Feb 12, 2024
1 parent 42cdacc commit 760f269
Show file tree
Hide file tree
Showing 15 changed files with 368 additions and 344 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/deploy-airflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ jobs:
files: 'airflow/requirements.txt'
- id: install-python-dependencies
if: steps.changed-requirements.outputs.any_changed == 'true'
run: gcloud composer environments update calitp-airflow2-prod --update-pypi-packages-from-file airflow/requirements.txt --location us-west2 --project cal-itp-data-infra
run: gcloud composer environments update calitp-airflow2-prod-composer2-patch --update-pypi-packages-from-file airflow/requirements.txt --location us-west2 --project cal-itp-data-infra

- name: Push Airflow code to GCS
run: |
gsutil -m rsync -d -c -r airflow/dags gs://$AIRFLOW_BUCKET/dags
gsutil -m rsync -d -c -r airflow/plugins gs://$AIRFLOW_BUCKET/plugins
env:
AIRFLOW_BUCKET: "us-west2-calitp-airflow2-pr-171e4e47-bucket"
AIRFLOW_BUCKET: "us-west2-calitp-airflow2-pr-88ca8ec6-bucket"
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ happy.
Generally we try to configure things via environment variables. In the Kubernetes
world, these get configured via Kustomize overlays ([example](./kubernetes/apps/overlays/gtfs-rt-archiver-v3-prod/archiver-channel-vars.yaml)).
For Airflow jobs, we currently use hosted Google Cloud Composer which has a
[user interface](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod/variables)
[user interface](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod-composer2-patch/variables)
for editing environment variables. These environment variables also have to be
injected into pod operators as needed via Gusty YAML or similar. If you are
running Airflow locally, the [docker-compose file](./airflow/docker-compose.yaml)
Expand Down
6 changes: 3 additions & 3 deletions airflow/Dockerfile.composer
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ FROM apache/airflow:2.4.3-python3.8
USER root

RUN apt-get update \
&& apt-get install -y git libmysqlclient-dev gcc libpq-dev \
&& apt-get install -y git default-libmysqlclient-dev gcc libpq-dev \
&& rm -rf /var/lib/apt/lists/*

RUN curl https://sdk.cloud.google.com > install.sh \
Expand All @@ -15,8 +15,8 @@ RUN gcloud components install gke-gcloud-auth-plugin

USER airflow

COPY requirements-composer-1.20.6-airflow-2.4.3.txt /tmp/requirements-composer-1.20.6-airflow-2.4.3.txt
RUN pip install --no-cache-dir --user -r /tmp/requirements-composer-1.20.6-airflow-2.4.3.txt
COPY requirements-composer-2.6.0-airflow-2.5.3.txt /tmp/requirements-composer-2.6.0-airflow-2.5.3.txt
RUN pip install --no-cache-dir --user -r /tmp/requirements-composer-2.6.0-airflow-2.5.3.txt

COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --user -r /tmp/requirements.txt
Expand Down
4 changes: 2 additions & 2 deletions airflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,10 @@ docker-compose run airflow tasks test unzip_and_validate_gtfs_schedule_hourly va

## Deploying Changes to Production

We have a [GitHub Action](../.github/workflows/deploy-airflow.yml) that runs when PRs touching this directory merge to the `main` branch. The GitHub Action updates the requirements sourced from [requirements.txt](./requirements.txt) and syncs the [DAGs](./dags) and [plugins](./plugins) directories to the bucket that Composer watches for code/data to parse. As of 2023-07-18, this bucket is `us-west2-calitp-airflow2-pr-171e4e47-bucket`.
We have a [GitHub Action](../.github/workflows/deploy-airflow.yml) that runs when PRs touching this directory merge to the `main` branch. The GitHub Action updates the requirements sourced from [requirements.txt](./requirements.txt) and syncs the [DAGs](./dags) and [plugins](./plugins) directories to the bucket that Composer watches for code/data to parse. As of 2024-02-12, this bucket is `us-west2-calitp-airflow2-pr-88ca8ec6-bucket`.

### Upgrading Airflow Itself

Our production Composer instance is called [calitp-airflow2-prod](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod/monitoring); its configuration (including worker count, Airflow config overrides, and environment variables) is manually managed through the web console. When scoping upcoming upgrades to the specific Composer-managed Airflow version we use in production, it can be helpful to grab the corresponding list of requirements from the [Cloud Composer version list](https://cloud.google.com/composer/docs/concepts/versioning/composer-versions), copy it into `requirements-composer-[COMPOSER_VERSION_NUMBER]-airflow-[AIRFLOW_VERSION_NUMBER].txt`, change [Dockerfile.composer](./Dockerfile.composer) to reference that file (deleting the previous equivalent) and modify the `FROM` statement at the top to grab the correct Airflow and Python versions for that Composer version, and build the image locally.
Our production Composer instance is called [calitp-airflow2-prod-composer2-patch](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod-composer2-patch/monitoring); its configuration (including worker count, Airflow config overrides, and environment variables) is manually managed through the web console. When scoping upcoming upgrades to the specific Composer-managed Airflow version we use in production, it can be helpful to grab the corresponding list of requirements from the [Cloud Composer version list](https://cloud.google.com/composer/docs/concepts/versioning/composer-versions), copy it into `requirements-composer-[COMPOSER_VERSION_NUMBER]-airflow-[AIRFLOW_VERSION_NUMBER].txt`, change [Dockerfile.composer](./Dockerfile.composer) to reference that file (deleting the previous equivalent) and modify the `FROM` statement at the top to grab the correct Airflow and Python versions for that Composer version, and build the image locally.

It is desirable to keep our local testing image closely aligned with the production image, so the `FROM` statement in our automatically deployed [Dockerfile](./Dockerfile) should always be updated after a production Airflow upgrade reflect the same Airflow version and Python version that are being run in the Composer-managed production environment.
2 changes: 1 addition & 1 deletion airflow/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ x-airflow-common:
GRAAS_SERVER_URL: $GRAAS_SERVER_URL

# Composer variables for kubernetes
POD_CLUSTER_NAME: "us-west2-calitp-airflow2-pr-171e4e47-gke"
POD_CLUSTER_NAME: "us-west2-calitp-airflow2-pr-88ca8ec6-gke"
POD_LOCATION: "us-west2-a"
AIRFLOW_ENV: "development"
CALITP_USER: "pipeline"
Expand Down
312 changes: 0 additions & 312 deletions airflow/requirements-composer-1.20.6-airflow-2.4.3.txt

This file was deleted.

Loading

0 comments on commit 760f269

Please sign in to comment.