-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example instructions on using Lambda #27
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
/bundle | ||
/bundle.zip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
bundle: config.json handler.py ../setup.py | ||
rm -rf $@ | ||
mkdir $@ | ||
cp handler.py config.json $@/ | ||
pip install .. -t $@/ | ||
|
||
bundle.zip: bundle | ||
rm -f $@ | ||
cd bundle && zip -r ../bundle.zip . | ||
|
||
.PHONY: clean | ||
clean: | ||
rm -rf bundle bundle.zip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,206 @@ | ||
# Integrating dumb-pypi with AWS Lambda | ||
|
||
[AWS Lambda][lambda] is a way to run code ("functions") in response to triggers | ||
(like a change in an S3 bucket) without running any servers yourself. | ||
|
||
dumb-pypi works very well with Lambda; you only need to regenerate the index | ||
when your list of packages changes (relatively rare), and you can serve the | ||
generated index without involving dumb-pypi at all. | ||
|
||
The steps below walk you through an example AWS Lambda setup where a change in | ||
a "source" bucket (containing all your packages) automatically triggers | ||
dumb-pypi to regenerate the index and store it in the "output" bucket. | ||
|
||
Depending on if you need to support old pip versions, you may even be able to | ||
serve your index directly from S3, avoiding running any servers entirely. | ||
|
||
|
||
## Initial deployment | ||
|
||
These instructions use the sample code in this directory as the base for the | ||
Lambda handler. The specifics of your bucket will likely vary; it's expected | ||
that you may need to adjust configuration options or the code itself to match | ||
your deployment. | ||
|
||
1. Create two S3 buckets, e.g. `dumb-pypi-source` and `dumb-pypi-output`. | ||
|
||
The source bucket is where you'll drop Python packages (tarballs, wheels, | ||
etc.) in a flat listing (all objects at the root of the bucket). | ||
|
||
The output bucket will contain the generated index (HTML files) which pip | ||
uses. | ||
|
||
2. Create an IAM role which allows reading from the source bucket and | ||
reading/writing to the output bucket. Select "Lambda" as the AWS resource | ||
the role applies to during creation. | ||
|
||
Here's an example policy (adjust as needed): | ||
|
||
```json | ||
{ | ||
"Version": "2012-10-17", | ||
"Statement": [ | ||
{ | ||
"Sid": "AllowReadToSourceBucket", | ||
"Effect": "Allow", | ||
"Action": [ | ||
"s3:List*", | ||
"s3:Get*" | ||
], | ||
"Resource": [ | ||
"arn:aws:s3:::dumb-pypi-source/*", | ||
"arn:aws:s3:::dumb-pypi-source" | ||
] | ||
}, | ||
{ | ||
"Sid": "AllowReadWriteToOutputBucket", | ||
"Effect": "Allow", | ||
"Action": [ | ||
"s3:List*", | ||
"s3:Get*", | ||
"s3:PutObject", | ||
"s3:DeleteObject" | ||
], | ||
"Resource": [ | ||
"arn:aws:s3:::dumb-pypi-output/*", | ||
"arn:aws:s3:::dumb-pypi-output" | ||
] | ||
} | ||
] | ||
} | ||
``` | ||
|
||
3. Adjust `config.json` in this directory as necessary (e.g. update | ||
source/output bucket and the arguments). You can easily change this stuff | ||
later. | ||
|
||
4. Build the first deployment bundle to upload to Lambda. From this directory, | ||
just run `make bundle.zip`. | ||
|
||
5. Create the function. For example, here's how you might do it with the AWS cli: | ||
|
||
```bash | ||
aws lambda create-function \ | ||
--region us-west-1 \ | ||
--function-name dumb-pypi \ | ||
--runtime python3.6 \ | ||
--role arn:aws:iam::XXXXXXXXXXXX:role/dumb-pypi \ | ||
--handler handler.main \ | ||
--zip-file fileb://bundle.zip | ||
``` | ||
|
||
(Replace the role, region, etc. to match your setup.) | ||
|
||
6. [Give your S3 source bucket permission][s3-allow-trigger] to trigger your new | ||
Lambda function. For example: | ||
|
||
```bash | ||
aws lambda add-permission \ | ||
--region us-west-1 \ | ||
--function-name dumb-pypi \ | ||
--statement-id AllowSourceBucketToTriggerDumbPyPI \ | ||
--action lambda:InvokeFunction \ | ||
--principal s3.amazonaws.com \ | ||
--source-arn arn:aws:s3:::dumb-pypi-source \ | ||
--source-account XXXXXXXXXXXX | ||
``` | ||
|
||
7. Set up a trigger so that changes to the source bucket cause the `dumb-pypi` | ||
function to run and regenerate the index. | ||
|
||
The AWS cli is very awkward, the easiest way to do this is to make a file | ||
like `policy.json` with contents like: | ||
|
||
```json | ||
{ | ||
"LambdaFunctionConfigurations": [ | ||
{ | ||
"Id": "NotifyDumbPyPI", | ||
"LambdaFunctionArn": "arn:aws:lambda:us-west-1:XXXXXXXXXXXX:function:dumb-pypi", | ||
"Events": ["s3:ObjectCreated:*", "s3:ObjectRemoved:*"] | ||
} | ||
] | ||
} | ||
``` | ||
|
||
(Again, replacing the function's ARN as appropriate for your account.) | ||
|
||
Then, using the AWS cli, add a "notification configuration" to the source | ||
bucket: | ||
|
||
```bash | ||
aws s3api put-bucket-notification-configuration \ | ||
--bucket dumb-pypi-source \ | ||
--notification-configuration "$(< policy.json)" | ||
``` | ||
|
||
|
||
## Serving from the S3 buckets directly | ||
|
||
The whole point of Lambda is to avoid running your own servers, so you might as | ||
well serve directly from S3 :) | ||
|
||
Keep in mind that if you need to support old pip versions, you [can't yet serve | ||
directly from S3][rationale] because these old versions rely on the PyPI server | ||
to do package name normalization; see [the README][README] for suggestions on | ||
how to use nginx to do this normalization. | ||
|
||
If you **do** want to serve from S3 directly, it's pretty easy: | ||
|
||
1. Enable read access to your source bucket. You can enable this to the public, | ||
whitelisted only to your company's IPs, etc. | ||
|
||
Here's an example policy which whitelists your bucket to everyone: | ||
|
||
```json | ||
{ | ||
"Version": "2008-10-17", | ||
"Id": "AllowReadOnlyAccess", | ||
"Statement": [ | ||
{ | ||
"Sid": "AllowReadOnlyAccess", | ||
"Effect": "Allow", | ||
"Principal": { | ||
"AWS": "*" | ||
}, | ||
"Action": "s3:GetObject", | ||
"Resource": "arn:aws:s3:::dumb-pypi-source/*" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
This will make your source bucket available at a URL like | ||
`https://dumb-pypi-source.s3.amazonaws.com`. | ||
|
||
2. Enable read access to your output bucket. Again, it's up to you who you | ||
allow; you can use the same example policy from above (just adjust the | ||
bucket name). | ||
|
||
3. Enable static website hosting for your output bucket, and set `index.html` | ||
as your "Index document". This appears to be the only way to get | ||
`index.html` to show up when accessing the root of a "directory" in S3. | ||
|
||
This will make your output bucket available at a URL like | ||
`http://dumb-pypi-output.s3-website-us-west-1.amazonaws.com/`. | ||
|
||
|
||
## Updating the code or config | ||
|
||
Any time you update the code or config, you need to re-deploy the bundle to | ||
Lambda. | ||
|
||
1. Run `make deploy.zip` to build a new deployment bundle. | ||
|
||
2. Use the AWS cli to update the code for the function: | ||
|
||
```bash. | ||
aws lambda update-function-code \ | ||
--function-name dumb-pypi \ | ||
--zip-file fileb://bundle.zip | ||
``` | ||
|
||
[lambda]: https://aws.amazon.com/lambda/ | ||
[rationale]: https://github.com/chriskuehl/dumb-pypi/blob/master/RATIONALE.md | ||
[s3-allow-trigger]: https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#grant-destinations-permissions-to-s3 | ||
[README]: https://github.com/chriskuehl/dumb-pypi/blob/master/README.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"source-bucket": "dumb-pypi-source", | ||
"output-bucket": "dumb-pypi-output", | ||
"args": [ | ||
"--packages-url", "https://dumb-pypi-source.s3.amazonaws.com/", | ||
"--title", "My Cool PyPI on S3" | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
import json | ||
import mimetypes | ||
import os | ||
import os.path | ||
import tempfile | ||
import time | ||
|
||
import boto3 | ||
|
||
import dumb_pypi.main | ||
|
||
|
||
def _load_config(): | ||
with open(os.path.join(os.path.dirname(__file__), 'config.json')) as f: | ||
return json.load(f) | ||
|
||
|
||
def _list_bucket(bucket): | ||
s3 = boto3.client('s3') | ||
paginator = s3.get_paginator('list_objects_v2') | ||
for page in paginator.paginate(Bucket=bucket): | ||
yield from ( | ||
json.dumps( | ||
{ | ||
'filename': package['Key'], | ||
'upload_timestamp': time.mktime(package['LastModified'].timetuple()), | ||
}, | ||
sort_keys=True, | ||
) | ||
for package in page.get('Contents', ()) | ||
) | ||
|
||
|
||
def _sync_bucket(localdir, bucket_name): | ||
# TODO: should also delete removed files | ||
s3 = boto3.resource('s3') | ||
bucket = s3.Bucket(bucket_name) | ||
for dirpath, _, filenames in os.walk(localdir): | ||
for filename in filenames: | ||
path_on_disk = os.path.join(dirpath, filename) | ||
key = os.path.relpath(path_on_disk, localdir) | ||
print(f'Uploading {path_on_disk} => s3://{bucket_name}/{key}') | ||
with open(path_on_disk, 'rb') as f: | ||
bucket.put_object( | ||
Key=key, | ||
Body=f, | ||
ContentType=mimetypes.guess_type(filename)[0] | ||
) | ||
|
||
|
||
def main(event, context): | ||
config = _load_config() | ||
|
||
with tempfile.TemporaryDirectory() as td: | ||
with tempfile.NamedTemporaryFile(mode='w') as tf: | ||
for line in _list_bucket(config['source-bucket']): | ||
tf.write(line + '\n') | ||
tf.flush() | ||
|
||
dumb_pypi.main.main(( | ||
'--package-list-json', tf.name, | ||
'--output-dir', td, | ||
*config['args'], | ||
)) | ||
|
||
_sync_bucket(td, config['output-bucket']) | ||
|
||
|
||
# Strictly for testing; we don't look at the event or context anyway. | ||
if __name__ == '__main__': | ||
exit(main(None, None)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at how
aws s3 sync
works and it basically does a(mtime, size)
comparison against the remote objects -- going to try and implement something like that as well :)