-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ingestion server removal IP with EC2 approach #4615
Conversation
Adding @sarayourfriend as a reviewer because of infrastructure implications, and since we first discussed this approach! Also pinging @AetherUnbound for the catalog. |
Full-stack documentation: https://docs.openverse.org/_preview/4615 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. Changed files 🔄: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Just two small comments, neither blocking. This is a great simplification and is a good step in the right direction to eventually remove the API from the workers, and just have them immediately start their job when starting. Doing that would make it easy to use spot instances for this work, saving us a huge sum of money on each run.
.../proposals/ingestion_server_removal/20240328-implementation_plan_ingestion_server_removal.md
Outdated
Show resolved
Hide resolved
- Note: it's possible to use this operator to launch all the workers at once | ||
by setting the `min_count` and `max_count` parameters. We will instead be | ||
using dynamic task mapping to create a separate task for creating each | ||
indexer worker individually: this is so that if a single worker fails | ||
later in the process, we can retry that worker in isolation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this explicit!
.../proposals/ingestion_server_removal/20240328-implementation_plan_ingestion_server_removal.md
Outdated
Show resolved
Hide resolved
I've opened a PR with the required infrastructure code change to make this happen: https://github.com/WordPress/openverse-infrastructure/pull/980 |
.../proposals/ingestion_server_removal/20240328-implementation_plan_ingestion_server_removal.md
Show resolved
Hide resolved
### ASG approach | ||
|
||
An earlier draft of this implementation plan used an AutoScaling Group for each | ||
environment. Airflow would use `set_desired_capacity` to set the ASG's capacity, | ||
and the ASG itself would then spin up (and down) EC2 instances. A disadvantage | ||
of this approach was that it is impossible to retry a single indexer worker. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks good. I've been meaning to suggest we find a standardized way to denote changes to our IPs and other documents. Perhaps a "changelog" section that links to the PR file diff and summarizes the changes?
What you've done seems sufficient for now, but reminded me to create an issue for this, so thanks 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections, only one small formatting suggestion I found while looking at the rendered document. Looks good!
.../proposals/ingestion_server_removal/20240328-implementation_plan_ingestion_server_removal.md
Outdated
Show resolved
Hide resolved
- Note: it's possible to use this operator to launch all the workers at once | ||
by setting the `min_count` and `max_count` parameters. We will instead be | ||
using dynamic task mapping to create a separate task for creating each | ||
indexer worker individually: this is so that if a single worker fails | ||
later in the process, we can retry that worker in isolation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this explicit!
Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
Fixes
Description
Updates the ingestion server removal IP to reflect changes discussed here (moving away from the ASG approach and back to managing EC2 instances more directly).
Checklist
Update index.md
).main
) or a parent feature branch../ov just catalog/generate-docs
for catalogPRs) or the media properties generator (
./ov just catalog/generate-docs media-props
for the catalog or
./ov just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin