-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Openverse API and catalog to openverse.org
subdomains
#2037
Comments
openverse.org
subdomainsopenverse.org
subdomains
Here are my answers to the questions I raised in the issue description:
If there are not sufficient Cloudflare rules in the free tier to redirect everything that needs it, we can use AWS load balancer rules. I don't have a strong preference either way, maybe with a slight lean towards Cloudflare because the rules are more flexible, and technically they're at the edge, and would perform slightly better than load balancer rules. But those benefits are super negligible, so if there is any hassle at all making it work in Cloudflare, AWS LB rules are a perfectly good fallback that will 100% work and will not cost us anything.
No reason to do this. Let's move everything over and call it a day.
This is no longer true. The new SSH bastion is on
NO. We will use AWS load balancer rules. We must avoid any solutions that introduce new services!
Yes.
The implementation plan will cover this. |
Implementation plan is approved and merged (thanks @dhruvkb and @AetherUnbound for the careful review). I've created all the issues for this project. They are primarily in the private infrastructure repository, almost all the work will happen there. The milestones are as follows: Preliminary work blocks the API and Airflow specific work. API and Airflow work can happen in parallel. Finalisation work is blocked by everything else. Additionally, these three issues are in the monorepo related to this:
I've update the issue description to collapse the pseudo-project proposal to prioritise the issue list. |
Thanks Sara for all this careful planning! I've gone ahead and moved all of the non-blocked tickets in the first preliminary work milestone into our TODOs. |
Thanks for mentioning the blocked issues in the preliminary work milestone. Those are actually finalisation work, I'd just forgotten to move them! They're in the correct milestone now 👍 |
The first major changes for this project are underway. I successfully deployed the changes in https://github.com/WordPress/openverse-infrastructure/pull/802 to move our Cloudflare record and page rule handling for live domains out of the next root modules, and into the new I updated the PR with that change to the RDS module, finished applying all the changes, and merged it. Today I opened https://github.com/WordPress/openverse-infrastructure/pull/804, which finishes the extraction of our "ingress" layer, by deduplicating load balancer listeners out of the generic service modules and other generic modules, into a single, much simpler generic module. This also involved a lot of clean up, due to the removal of unused variables from the generic service modules. I wrote a lengthy PR description to cover everything the PR changes, and help reviewers move through it was quickly and as confidently as possible. While working on that, I noticed some issues with our API environment variables that will need to be adjusted for this project as well, missed during the implementation planning process. I created a new issue for that #3821, and will start working on it today, as it isn't blocked by anything else, but will block the very first task of the API migration issues once the preliminary work is finished. |
Preliminary work for this is finished now. I will start working on preparing the Airflow migration now. |
Edit: This rule has been proactively added to Cloudflare manually. When we address WordPress/openverse-infrastructure#325 this change will be codified in our infra repo along with the other firewall rules. No action needs to be taken here. I wanted to make a note here concerning Cloudflare and some of the currently-manual configuration for dealing with bots. On the frontend we now have "super bot fight mode" enabled, which automatically blocks all traffic from known and likely bad bots, while allowing "verified" web crawlers like Internet Archive, Google, Bing, etc. to access the frontend. After moving to the openverse.org domain, we probably want to create Web Access Firewall rules to skip "super bot mode" rules for the API. I would guess at least that our API users programmatically accessing the API would be marked as bots by Cloudflare and blocked by these rules. Specifically, our WAF rules need to skip the "http_request_sbfm" (sbfm = super bot fight mode) request phase. I think the whole rule would (roughly) look like this: # https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/ruleset
resource "cloudflare_ruleset" "skip_sbfm_for_api" {
zone_id = var.cloudflare_zone_id
name = "Skip Super Bot Fight Mode for the API"
kind = "zone"
phase = "http_request_sbfm"
rules {
action = "skip"
expression = "(not http.host matches \"(api\.|api-staging\.)openverse\.org\")"
description = "Skip Super Bot Fight Mode for requests to the API"
}
} |
@zackkrida Can you please add this note to the issue for moving all existing Cloudflare rules from the .engineering zone to the .org zone, so that whoever implements that issue will definitely see this information? https://github.com/WordPress/openverse-infrastructure/issues/777 Is the main issue needing to make sure that the existing rule for the frontend in the .org zone not accidentally cause an issue to the API? BTW: we're not using paths for this project, it'll all be on subdomains, so the expression should check the hostname for the API subdomain, rather than any part of the path. To clarify also, do we need to bypass it for Airflow too? Kibana works fine, so I think Airflow should be okay as well: both are behind access and there is no automated traffic to either. Is that your understanding as well? If so, please clarify this in whatever update you leave in the issue 🙏 |
@sarayourfriend I've updated my comment to match the hostname correctly. I've also manually added this rule to Cloudflare now, so no action needs to be taken in https://github.com/WordPress/openverse-infrastructure/issues/777.
Yes, the goal is so that Super Bot Fight Mode doesn't block programmatic API traffic once it's moved over to openverse.org. We do not need to bypass this for Kibana or Airflow. |
As in, added to the .org zone? Will you open a PR to add it to the cloudflare root module of the infrastructure repository? Ideally we move away from the practice of manually defining rules in Cloudflare without reviews or the chance to document in comments on them, particularly for things we expect to exist for a long time or indefinitely. |
@sarayourfriend agreed on no longer manually defining these in the cf ui,I mentioned in my edit that we should move this firewall rule with the others. |
Okay, thanks. |
Yesterday I got up the PR to deploy Airflow with Ansible on a stable EC2 instance: https://github.com/WordPress/openverse-infrastructure/pull/829 Staci's early review comment made me realise I hadn't mentioned in the project thread that the security group refactor described in the implementation plan is not workable. Details of that are described in this comment on the issue that was meant to implement the abstracted security group module. To summarise: the proposed abstraction in the implementation plan largely misses the point of security groups and how to best organise them. Rather than enforcing a uniform basic standard of ingress/egress rules by applying those rules to each security group individually, we should add instances to relevant security groups configured with the rules relevant to them. We can have several thousands of security groups, and with our service volume we will not run into the limit, even if we had an individual security group for each and every rule (which we don't need). Rather than configuring each EC2 instance's security group with SSH ingress rules, we should just add the instance to a shared security group with those rules. That will be a long term refactor that I need to sit down and plan out into discrete tasks. We cannot do it as part of this project without causing significant delays to it as well as disruptions to virtually all other ongoing infrastructure work. I don't think this needs an implementation plan, it just needs someone (very likely me) to sit down and look at all our existing security groups to identify the places that need changes. Another issue that complicates this is the need to migrate away from both inline security group rules in Terraform and away from the old security group rule resources, all towards the new rule resources which have the significant benefit of leveraging AWS's relatively new security group rule ids, which neither of the previous approaches (which are the ones we use) implement. So, we will still eventually change how we manage security groups to reduce duplication, we just won't do it the way the implementation plan suggests, and we will not do it as part of this project. |
While working on https://github.com/WordPress/openverse-infrastructure/issues/777, I learned that Cloudflare is happy to interpret upstream cache-control header instructions as edge TTL instructions. I've opened #4005 to incorporate our cache control information into the API itself, which will allow us to eliminate a handful of individually defined Cloudflare rules for a single "use the upstream cache-control header as edge ttl" rule for the API. |
Airflow is live at airflow.openverse.org 🎉 A final infrastructure PR is up to finalise the Ansible, compose, monitoring, and IAM configuration. Please review this as soon as possible, @WordPress/openverse-catalog @WordPress/openverse-infrastructure. The resources are live in production and ideally these changes are merged to |
Hi @sarayourfriend, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information. |
I am waiting on two reviews for this PR, which blocks further work on the API side of things. Last week I merged and applied the PR to migrate our openverse.engineering Cloudflare zone's rules into the openverse.org zone. That went well save for some small configuration issues with obscure errors from the Cloudflare API which were easy to resolve. @AetherUnbound and @stacimc successfully deployed changes to Airflow for a new Airflow and Python version last week using the playbooks. They ran into an issue with the community.docker module being out of date on their local, so I worked on using PDM to pin our Ansible dependencies in the infrastructure repository: https://github.com/WordPress/openverse-infrastructure/issues/855. Spurred on by a discussion resulting from the Ansible work, and in anticipation of/keeping in mind the ingestion worker deployment I have spent two of my work days last week on a PR to use Packer to build AMIs to deploy with ASGs. This will eventually result in changes to all services deployed in EC2 ASGs, which now includes Airflow, but will not be part of this project. Just noting it here because it's taken some time away from pushing forward the API migration to openverse.org, though as I said, work there is somewhat blocked. I plan to spend at least a few hours this week working on the copy for #3742 and #3743. Everything else really does depend on the openverse.org domains for the API being live, but the copy for the email and Make post (and the management command itself) can be implemented in full, with a placeholder left for the date in the meantime. I don't have an estimated date yet, but given the very slow pace of reviews for the API side of this project, I am targeting the end of June as the earliest likely shipped date for most of this project (with potentially some lingering PRs to update refenreces to the API in Jetpack and Gutenberg, which are not in scope, but referenced by this project). |
The API is now available at api.openverse.org 🎉 |
I've drafted text for the Make post and email https://docs.google.com/document/d/1ESmzbH6vkp8rxJBsy3P0_BLZ01sgKVFAQa3LnPSvBhQ/edit?usp=sharing @WordPress/openverse-maintainers, please review the text. I'll start working on the baseline requirements for the management command in #3742. |
I've got the make post up and scheduled, with a switch-over date of 3 June 2024, which @zackkrida and I just decided on. That gives a month of lead time, with the post scheduled to publish 6 May at 00:00 UTC. This PR introduces the management command for sending the email to registered and verified API users: #4229 Finally, https://github.com/WordPress/openverse-infrastructure/pull/876 updates the canonical URL and introduces a staging-only redirect for testing. |
The Make post went out earlier this week: https://make.wordpress.org/openverse/2024/05/06/the-openverse-api-is-moving-to-api-openverse-org/ I got all set up to run the management command to send notifications to registered API users, but found an issue with the query in the original code, and it only pulled 2 email addresses in production to send to. That's not the expected outcome. I dug deeper, and realised we made a mistake in how we wrote the query. We'd written the query off of the Instead, we need to query off the registration table where I've confirmed that with this re-written query, we get a more expected number of emails to send in production. I'll shortly have a PR up to fix the query. |
The emails (750 in the end) announcing the API move are sent as of 2024-05-08T00:59:58.366Z. I'll open a PR to stage the redirect cut over on 3 June, but until then, all other work on this is blocked, except for https://github.com/WordPress/openverse-infrastructure/issues/786. |
The redirects are LIVE! Everything appears to be working. I tried out Gutenberg and Jetpack integrations and both look to be fine as far as I can tell. I've removed https://github.com/WordPress/openverse-infrastructure/issues/781 and https://github.com/WordPress/openverse-infrastructure/issues/782 from the milestones for this project because we agreed they should not block the "shipped" status of this project, and I wanted to make that clear based on the milestone. With that, the API tickets are done, and I've closed the milestone! I'm going to start working on https://github.com/WordPress/openverse-infrastructure/issues/785 now, which will free up https://github.com/WordPress/openverse-infrastructure/issues/787 very soon 🎉. I think at that point the project can actually go into success rather than shipped, because the success criteria was to be able to downgrade to the free tier. @zackkrida do you agree with that assessment? If not, on what cue would we move from shipped to success for this project? We could put an arbitrary amount of time to just monitor things generally before declaring this finished? Also, just for fun, here's a graph showing all the status codes we've returned from openverse.engineering since I applied the redirect 🙂 This in effect proves that openverse.engineering is no longer used for anything but those redirects, meaning it is safe to remove all the rules and such from it. |
@sarayourfriend sadly, the redirect broke the I made an infra PR to revert the change, and a Gutenberg PR was already merged to replace the URLs to use openverse.org. To keep old versions of WordPress working, now, we would have to keep openverse.engineering operational indefinitely. That is clearly infeasible, so instead, I think we should re-implement the redirect right after WordPress 6.6 launches. I also added a status update to our https://make.wordpress.org/openverse/2024/05/06/the-openverse-api-is-moving-to-api-openverse-org/ make post for the change, and pinned that post as we figure out next steps. |
We've got a solution to redirect everything except the media inserter. The solution doesn't prevent the goals of this project from succeeding as it only uses free Cloudflare features and can theoretically exist indefinitely. We can discuss when specifically we would remove it, whether that's with the 6.6 launch or later, in the follow up to this work. |
Once https://github.com/WordPress/openverse-infrastructure/pull/920 is merged this project can be moved from shipped to success 🥳 |
I've just applied and merged https://github.com/WordPress/openverse-infrastructure/pull/920! This project is complete 🎉 |
Summary
Move the infrastructure which currently exists on
openverse.engineering
toopenverse.org
.Description
Pseudo-project proposal (collapsed to preference the issue and document list)
We currently pay for two Cloudflare accounts, one for
openverse.engineering
andopenverse.org
. If we move the API and catalog to live on subdomains ofopenverse.org
, we should be able to change ouropenverse.engineering
account to a free one, saving 200 USD a month that can go towards e.g., Plausible.In the initial discussion we had about this we only talked about moving the API. However, because Airflow is behind Cloudflare Access, we may need to move it as well. I've tried to understand whether that is the case based on the Cloudflare pricing and it seems like Cloudflare Access might be free under our current usage but it isn't clear to me whether that's free for specific paid account types or free for any account type.
The end result of this project should be that
openverse.engineering
and its usage should be entirely covered by a free Cloudflare account. The planning for this project must consider the various features we will need that will cause contingencies here:api.openverse.engineering
andapi-production.openverse.engineering
. Do we need to redirect staging (api-staging
)? What about the legacy staging subdomain,api-dev
? We also continue to redirectsearch.openverse.engineering
, our legacy frontend domain(s). Cloudflare free only supports three page rules. We currently have 3 redirects already (for the frontend). We would need at minimum one more for the API (api.openverse.engineering
) but to match our existing redirect philosophy with the frontend, we should redirectapi
,api-production
andapi-staging
. That would lead to 6 minimum page rules if we used page rules for the redirects. The other existing page rules are caching related and would be moved intoopenverse.org
, so they do not need to count towards our total page rule utilisation.openverse.engineering
if it is a free account? Namely, as discussed above, do free Cloudflare accounts support Cloudflare Access? If we manage Cloudflare Access on two different Cloudflare zones, the access terraform module will need to be updated or potentially completely re-written to accommodate multiple Cloudflare zones.Additional considerations:
openverse.engineering
. Should is stay there?A task list was written by @zackkrida and @dhruvkb before. It is listed below in a collapsed element as I think it should be referenced with a grain of salt. It demonstrates the overall picture well, especially for communications, but hides what I think are probably the most complex parts of this (namely the individual infrastructure steps we need to take for the first task).
The task list created during the original, internal discussion and proposal of this project.
Documents
Because this project is relatively clear in its motivations and requirements, we will skip the project proposal. This project thread's description will serve as a general project proposal.
openverse.engineering
Cloudflare zone so that it can be switched to a free account #2038Issues
Issues are mostly in the infrastructure repository, organised into the following four milestones.
Preliminary work blocks the API and Airflow specific work. API and Airflow work can happen in parallel. Finalisation work is blocked by everything else.
Additionally, these four issues are in the monorepo related to this:
api.openverse.org
#3741api.openverse.org
domain #3742The text was updated successfully, but these errors were encountered: