Skip to content

Commit

Permalink
Merge pull request #492 from ucldc/cleanup
Browse files Browse the repository at this point in the history
chore: Remove code related to AWS Lambda deployment strategy
  • Loading branch information
amywieliczka authored Aug 17, 2023
2 parents 2b0b81a + 02516e6 commit 3d4f5a1
Show file tree
Hide file tree
Showing 16 changed files with 22 additions and 652 deletions.
32 changes: 12 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,6 @@ We use PR reviews to approve or reject, comment on, and request further iteratio
## Code Style Guide

- PEP 8 (enforced using flake8)
- DRY
- Readability & Transparency: Code as language
- Favor explicitness over defensiveness
- Import statements grouped according to [isort](https://pycqa.github.io/isort/index.html) defaults:
Expand All @@ -100,34 +99,27 @@ We use PR reviews to approve or reject, comment on, and request further iteratio
- LOCALFOLDER


## Deploying Using AWS SAM
## Airflow Development

We are using AWS SAM to build the rikolti lambda applications and deploy to AWS. Following are proposed steps for building and deploying using SAM:
### Set up `aws-mwaa-local-runner`

Make sure you [have SAM CLI installed](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html).
AWS provides the [aws-mwaa-local-runner](https://github.com/aws/aws-mwaa-local-runner) repo, which provides a command line interface (CLI) utility that replicates an Amazon Managed Workflows for Apache Airflow (MWAA) environment locally via use of a Docker container. We have forked this repository and made some small changes to enable us to use local-runner while keeping our dags stored in this repository. (See this slack thread for more info: https://apache-airflow.slack.com/archives/CCRR5EBA7/p1690405849653759)

From the rikolti directory, which contains `template.yaml`, build the serverless applications:
To set up this dev environment, first clone the repo locally:

```
sam build --use-container
git clone git@github.com:ucldc/aws-mwaa-local-runner.git
```

Using the `--user-container` option has SAM compile dependencies for each lambda function in a lambda-like docker container. This is necessary for compiling libraries such as lxml, which need to be natively compiled. The runtime and system architecture are defined for each lambda in the `template.yaml` file.

> **NOTE**
> There is a [troposphere](https://troposphere.readthedocs.io/en/latest/quick_start.html) script stub named `create_sam_template.py` checked into the repo. This script generates `template.yaml`, but we decided not to use it for now since the template is simple and we don't need to introduce another layer of tooling at this point. We might use it in the future to generate templates for different environments and such.
Once built, deploy to AWS:

Make sure `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `AWS_SESSION_TOKEN` env vars are set. Then, the first time you deploy:
Then, modify `aws-mwaa-local-runner/docker/.env`, setting the following env vars to wherever the directories live on your machine, for example:

```
sam deploy --guided
DAGS_HOME="/Users/username/dev/rikolti/airflow/dags"
PLUGINS_HOME="/Users/username/dev/rikolti/airflow/plugins"
REQS_HOME="/Users/username/dev/rikolti/airflow"
STARTUP_HOME="/Users/username/dev/rikolti/airflow"
```

Follow the prompts. Say yes to save arguments to a `samconfig.toml` configuration file. Then on subsequent deploys you can just type:

```
sam deploy
```
These env vars are used in the `aws-mwaa-local-runner/docker/docker-compose-local.yml` script (and other docker-compose scripts) to mount the relevant directories containing Airflow DAGs, requirements, and plugins files into the docker container.

Then, follow the instructions in the [README](https://github.com/ucldc/aws-mwaa-local-runner/#readme) to build the docker image, run the container, and do local development.
26 changes: 0 additions & 26 deletions airflow/README.md

This file was deleted.

157 changes: 0 additions & 157 deletions create_sam_template.py

This file was deleted.

40 changes: 0 additions & 40 deletions metadata_fetcher/deploy-version.sh

This file was deleted.

20 changes: 6 additions & 14 deletions metadata_fetcher/fetchers/nuxeo_fetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
import subprocess
from urllib.parse import quote as urllib_quote

import boto3
import requests

from .. import settings
Expand Down Expand Up @@ -255,16 +254,9 @@ def recurse(self, path=None, query_type=None, prefix=None):
'prefix': prefix if prefix else self.nuxeo['prefix']
}
}
if settings.LOCAL_RUN:
subprocess.run([
'python',
'lambda_function.py',
json.dumps(lambda_query).encode('utf-8')
])
else:
lambda_client = boto3.client('lambda', region_name="us-west-2",)
lambda_client.invoke(
FunctionName="fetch_metadata",
InvocationType="Event", # invoke asynchronously
Payload=json.dumps(lambda_query).encode('utf-8')
)
# TODO: AW: this shouldn't be a subprocess
subprocess.run([
'python',
'lambda_function.py',
json.dumps(lambda_query).encode('utf-8')
])
15 changes: 2 additions & 13 deletions metadata_fetcher/lambda_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@
import logging
import sys

import boto3

from . import settings
from .fetchers.Fetcher import Fetcher, InvalidHarvestEndpoint

logger = logging.getLogger(__name__)
Expand All @@ -24,7 +21,7 @@ def import_fetcher(harvest_type):

# AWS Lambda entry point
def fetch_collection(payload, context):
if settings.LOCAL_RUN and isinstance(payload, str):
if isinstance(payload, str):
payload = json.loads(payload)

logger.debug(f"fetch_collection payload: {payload}")
Expand Down Expand Up @@ -55,15 +52,7 @@ def fetch_collection(payload, context):
fetch_report = [fetch_report]

if not json.loads(next_page).get('finished'):
if settings.LOCAL_RUN:
fetch_report.extend(fetch_collection(next_page, {}))
else:
lambda_client = boto3.client('lambda', region_name="us-west-2",)
lambda_client.invoke(
FunctionName="fetch_metadata",
InvocationType="Event", # invoke asynchronously
Payload=next_page.encode('utf-8')
)
fetch_report.extend(fetch_collection(next_page, {}))

return fetch_report

Expand Down
Loading

0 comments on commit 3d4f5a1

Please sign in to comment.