-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PROPOSAL: Transfer ownership of go.opentelemetry.io to OpenTelemetry #2260
Comments
Overall seems fine to me, although I'd advocate for us to migrate over to our Hugo + Netlify setup as a fast follow. |
I've opened https://cncfservicedesk.atlassian.net/servicedesk/customer/portal/1/CNCFSD-2407 to request a GCP account for this purpose. |
Yeah, this should get a good review once the immediate transfer is done |
Should the new GitHub project also include @open-telemetry/go-approvers? |
If I understand it correctly the vanity URLs are sourced from this file: This file had 7 updates in 5 years, so while having the workflow in place is a nice to have, it doesn't seem to be mandatory. As @austinlparker suggested we can accomplish the same by leveraging hugo + netlify. I am saying that because the whole completion of this issue depends on step 2 (Create an OpenTelemetry-owned cloud project), while step 1 is nice to have, it does not seem to be mandatory to me. Step 1, 3 and 4 are under our control entirely. If the creation of that community owned cloud project takes time, we can also look into the alternative already, either as fallback or as future solution. |
@svrnm based on Austin's comment above, it sounds like creating a GCP project shouldn't take much time. The rest of the steps I've outlined let us drop-in the existing solution ASAP. This is a transition plan, not a long-term plan. We can absolutely look at alternatives once that's done. |
apologies for my misleading comment, I fully support that we do the transition to a community-owned GCP instance first, I wanted to offer a fallback if the GCP project creation takes more time than expected, because I am worried that it is not a quick process, although it depends on what your expected timeline is (days? weeks?) In that context, can you share some details on the sizing and properties of the instance needed, so we have it ready when we can set it up. |
@svrnm no worries, sorry I misunderstood you. The app is running on an F1 AppEngine instance, that seems to be the only workload handling resource in the project. The service has a custom domain configured for go.opentelemetry.io and uses the default App Engine service account. There is also a service account for CircleCi to deploy the app from github, which has the following IAM roles:
I think all of this should be easily movable to an OpenTelemetry GCP organization if we follow https://cloud.google.com/resource-manager/docs/moving-projects-folders#console. That depends on if there is an Otel GCP org. Otherwise, since it's currently associated with the google org, we can't change it back to "no organization" and will need an entirely new project with these settings. |
I have a branch for the dependency updates to govanityurls here: https://github.com/damemi/govanityurls/tree/update-go-122. Once the Otel fork is up I'll open a PR to that repo. I also kicked off the process of finding the owners of the GCP govanityurls repo. It seems abandoned at this point, if that's the case I'd like to mark it officially no longer maintained. |
I do want to set expectations here appropriately; I believe it won't take too much time, but historically it can take a couple of weeks before we get our service desk tickets resolved. I have updated the ticket to indicate the urgency of this situation, but it's kinda out of our hands. |
Thanks @austinlparker. If it is going to take a while I'm happy to help parallelize this by working on a hugo+netlify solution in the meantime. Worst case there is we get a head start on the fast follow. My only issues with that are:
|
@open-telemetry/docs-maintainers, and especially @chalin have all the expertise needed for that, so don't worry :-) -- We can create a prototype as needed to showcase how it works. But for now, let's give the ServiceDesk ticket some more time to be resolved. |
Sorry for my stupid question, but isn't it just a matter of having a github repo named |
For the same reasons the netlify+hugo solution is a fallback/alternative:
Note, that @chalin and I discussed a potential solution via netlify, and based on that I cobbled together a prototype, see open-telemetry/opentelemetry.io#5022 Note, I also just checked the Service Desk issue, there is still no response. |
Sorry, I meant github pages directly as opposed to Hugo+Netlify. IMO, being able to send a pull request to a repo named |
Ah, ok, that's a viable option indeed! The vanity URLs are really simple, so a repo with some HTML files would probably do the trick |
It would be good to go through the vanity url app's handler code to make sure everything is possible in a static site. I haven't dug too deep into it, but it seems weird to me that something like this exists in the first place if it can be done with just a proxy site. Maybe the purpose is just to be an abstracted solution? |
I would strongly ask that we move forward with the original plan before we try to make too many changes at once. The existing application serves builds for many projects that have high use and high impact (i.e. Kubernetes, Grafana, Prometheus, Moby). There are many more vendors that rely on this package site being up to continue business operations. Availability needs to be an important point in this discussion, both in the sort-term cut over, and in the long-term operational support. This application is currently run on a platform with an SLA for update of >99.95%. We have engineers managing the project with direct lines of communication with the platform team, and the Go maintainers are familiar with the technology. These are all things that inspire confidence in the current design. For alternate proposals, can you please provided an overview on the reliability that will be provided for the alternatives? |
Main reason why we discuss alternatives is that the Service Desk issue takes some undefined time, and this is about having a fallback, if it takes too long (whatever that means). Until then going forward with the original plan is ... the plan.
I had the same thought, but apparently it's very basic, see https://go.dev/ref/mod#serving-from-proxy: When the go command downloads a module in direct mode, it first looks up the module server’s URL with an HTTP GET request based on the module path. It looks for a tag with the name go-import in the HTML response. The tag’s content must contain the repository root path, the version control system, and the URL, separated by spaces. See Finding a repository for a module path for details. Same for the The only "more complex" thing is in the refresh header and the body link, where the dynamic sub path gets included. But none of that is needed to make this work, this is pure convenience if someone hits that page with a browser. In hugo+netlify we need to do some redirect magic to replicate that. Removing that what remains is
which is all static
For netlify we are on the enterprise plan, which has a 99.99% uptime SLA (search for SLA on this page: https://www.netlify.com/pricing/), Github seems to have a 99.9% (https://github.com/customer-terms/github-online-services-sla) |
Interesting, maybe this project predates that? Would explain why it's been abandoned at least. FYI I'm poking around Google to find if there are any active maintainers for govanityurls and if not, I'm planning to mark the project as officially not maintained |
on the other hand, having to fork an abandoned project to keep it working doesn't inspire confidence 😅 |
The current script seems very generic with pieces that are not relevant for us (handling bit bucket repositories for subversion, mercurial, among others), all of that to serve a simple HTML with a couple of meta tags (the ones @svrnm linked above). We have only a few Go repositories, all of them hosted in GitHub: we could certainly have a few static HTML pages served via GitHub pages... |
Hi all, quick update - we now have a GCP account for this. Please let me know who I need to add to the project in order to move this along. |
Thanks @austinlparker, can you please add me (mikedame@google.com) to the GCP project? @MrAlias also but I don't know which email he would like to use. Can we also get the fork repo created under github.com/open-telemetry/govanityurls? Not sure who needs to do that (@svrnm @jpkrohling @jsuereth ?) For the record, I 100% agree with the discussion around a better solution if I wasn't clear about that. Tyler mentioned to me offline that there could be some dns/scaling issues with the netlify proposal -- I'm not familiar enough to speak on that but he could explain more here. I am just in the camp that going one step at a time and migrating to a longer term solution is safer. Otherwise, I'm ready to set up the GCP app now and happy to help with whatever alternatives we decide on too |
@damemi I've added you as an owner to the project; Let me know if a lesser role is suitable. |
@austinlparker thanks, just got it. I should definitely be removed before we switch the DNS settings |
Ping, this is still blocked waiting on a new github repo |
Hey @austinlparker thanks for getting the GCP project setup. Can you add me (codingalias@gmail.com) to the project as well. |
Opened open-telemetry/govanityurls#1 to update the new fork repo to go 1.22 Next step will be testing the deployment in the GCP project Then update https://github.com/open-telemetry/opentelemetry-go-vanityurls/
|
@MrAlias I was able to add you as an Owner (hope you don't mind, @austinlparker). Can you work on linking the existing CircleCI/github repo (https://github.com/open-telemetry/opentelemetry-go-vanityurls) to this project? |
(Sorry to spam but want to keep track in one spot) I deployed the app to the new project and it seems to work: https://golang-imports.uc.r.appspot.com/ Testing with a
Looks like it resolved, but the app config is expecting Next step is to update the old config repo (open-telemetry/opentelemetry-go-vanityurls#22), verify that CircleCI can deploy to the new project (this still needs to be set up), then we can switch DNS to the new app. It would be good if we could canary the DNS switch but not sure if that's possible. |
I've updated the CircleCI job with a new service account from the new GCP project and updated the GCP project referenced there. I still need to figure out the correct permissions for the new service account now (currently no permissions). I'm working with @damemi out-of-band to determine this. |
Usually this is done locally with something like |
I don't see a way to do a canary for the DNS, unfortunately. |
Some Googling tells me |
Would that be done on the AppEngine app/load balancer? Or somewhere in the registrar settings? It seems like that would be a local change. Either way, how should we announce the switch to make people aware in case there is an issue? Is a blog post good enough? |
Yeah, local only. |
SGTM |
The TXT record to verify |
Done |
Before we make the DNS changes, I'd like to merge a blog post and choose a date that a few of us are available to synchronously make the change. Draft post here: open-telemetry/opentelemetry.io#5087, please take a look |
@austinlparker @MrAlias and I just finished migrating the DNS records for go.opentelemetry.io to the new GCP project. Traffic seems to have dropped to 0 in the old project and we are seeing successful requests in the new project. Discussed follow-up with them:
We learned a lot through this process and I hope everyone found it valuable. Thank you to everyone for the fast responses and support to get this done! I think we can close this issue now and take care of the cleanup tasks on their own. |
Update: We are seeing an issue with DNS records resolving go.opentelemetry.io at the new host. It seems to be related to TTL since this was briefly working (verified with traffic to the new app) but stopped after ~1hr. Will post updates here |
Seems to resolve using some resolvers, but not all
|
We think the DNS issues are resolved now. Working on a postmortem |
Any update? |
Traffic to the new instance has been steady for the past week, with traffic from the old instance dropping to 0. At this point the host is successfully migrated. @austinlparker is tasked with governance of the new GCP project. I will track maintenance of the forked app separately. Closing this issue as a success 🎉 |
Can't say I'm thrilled about the bus factor. Can we prioritize getting away from GCP onto a simple static hosting? |
@yurishkuro I mean that Austin is going to set up a group of owners for the GCP project (TC/GC), not that he is tasked with personally owning it |
tl:dr: All requests for
go.opentelemetry.io
currently route through an abandoned project that only Google employees have access to. This outlines a plan to transfer ownership of this project to the OpenTelemetry community.Background
The app which routes requests for
go.opentelemetry.io
is owned by Google in a GCP project under thegoogle.com
organization. This app is very outdated and is not actively maintained by anyone at Google.Any request for
go.opentelemetry.io
(or a path, such asgo.opentelemetry.io/otel
) is served by this app. This includes all imports in any Go project, for examplego get
(when not using a cached version of the dependency or a custom proxy). Note that this also affects Collector (+Contrib) imports as well.The app is a deployment of https://github.com/GoogleCloudPlatform/govanityurls, which was last updated in March 2020. This app is essentially a proxy, where any request for the "vanity URL" is routed to an actual Github/pkg.go.dev path and the response is returned. This project is also not actively maintained by anyone at Google.
OpenTelemetry uses govanityurls in https://github.com/open-telemetry/opentelemetry-go-vanityurls/. This repo contains a config file for the vanity routes and a deploy script.
The deploy script is run as a postsubmit CircleCI job that deploys the upstream govanityurls project from HEAD to the
opentelemetry-go
GCP project undergoogle.com
. The app then serves all requests forgo.opentelemetry.io
imports by default, with over ~20,000 requests/day.The limited ownership and maintenance of this project creates a critical single point of failure for the OpenTelemetry Go ecosystem.
Goals
This plan has the following goals:
go.opentelemetry.io
to the community's control.Non-goals
This plan does not include the following:
go.opentelemetry.io
requests in more detail.Migration plan
This plan includes the following steps:
Step 1: Fork GoogleCloudPlatform/govanityurls
Who: TC (to create the repo), @damemi and @MrAlias to update dependencies)
Because the upstream project running this app is no longer maintained, the Otel community needs a fork that we have merge permissions on. We will create that fork (ie, github.com/open-telemetry/govanityurls). We will then update the dependencies in that fork to the latest available releases.
Step 2: Create an OpenTelemetry-owned cloud project
Who: TC (to create project), @damemi and @MrAlias if necessary
Currently, the config repo at https://github.com/open-telemetry/opentelemetry-go-vanityurls uses GCP to deploy changes to the app to an App Engine service. It will be easiest for OpenTelemetry to create a new GCP project and migrate the existing project to the new one.
Step 3: Update open-telemetry/opentelemetry-go-vanityurls
Who: @damemi, @MrAlias
The config repo will need to point to the new GCP project. This should be a one-line change in the deploy script. However, there may be additional project settings to link this GitHub repo to the new project via a service account.
Step 4: Update DNS records for go.opentelemetry.io
Who: TC
This is the final step for this change to take full effect. While it should transition with zero downtime, it should be communicated to the community in the event of any potential downtime.
We should first confirm that the current DNS TTL for go.opentelemetry.io is set to a reasonable amount that will allow this switch to happen in a timely manner.
Next, the DNS records will be updated to point to the new GCP project/service. As the DNS records propagate, requests will automatically route to the new service with minimal, if any, downtime expected.
Cleanup
When traffic to the old service drops to 0, the old GCP project will be deleted.
Governance
The maintainers of the new GitHub project will be:
The maintainers of the new Cloud project will be:
During the transition, temporary additional access may need to be granted to non-TC members who need to set up and verify the new project (myself and @MrAlias have both volunteered).
Timeline
This change is high priority due to the central importance of
go.opentelemetry.io
to the OpenTelemetry Go ecosystem. Work on this transition should begin immediately and complete as soon as possible.Future work
As part of our contributions to OpenTelemetry, the Google team is willing to assist with any work needed for this transition.
Following the transition, the OpenTelemetry community is free to treat this like any other part of infrastructure, including assessing long-term solutions and allocating maintainers.
One such task would be evaluating the use of govanityurls vs alternative proxies for this kind of request handling. The community may also choose to evaluate alternative deployment methods or cloud providers altogether. Neither of these are in scope for this transition plan, which has the top priority of putting the project into a stable state, as soon as possible, with as few changes as possible.
cc @jsuereth @dashpole
The text was updated successfully, but these errors were encountered: