-
Notifications
You must be signed in to change notification settings - Fork 0
Atlas testing of AWS v20 together with Vintage-CAPA migration #3209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Regarding vintage 20.0.0, I've tested both creation of v20 and an upgrade from v19.3.1 to v20 on garfish and it is working. I have a bit of concern regarding the grafana-agent and the logging operator creating it's config as I noticed that the secret was not created during my initial tests but it will solve itself after 12 hours or a restart of the logging operator so it is not blocking. @marieroque I recall that you faced something similar in an incident (grafana-agent secret missing or smth). Could you take a quick look ? Tested both creation and upgrade @T-Kukawka as a remainder, we still need to discuss the release notes as the observability-bundle in release 20.0.0 has a few breaking changes :) |
awesome progress ❤️ Release notes are waiting for everyone to finish, you will be pinged when i am back :) |
So for confirmation, the part 3 is about all our managed apps and possible customer configs :( |
Yes, because we need to know if customers can be migrated safely in order to reduce the risk. |
For sure, we will try to have as much different configs as possible for that |
@giantswarm/team-atlas to make sure we test the migration properly, I would like if we could deploy all apps we have (i know this will be painful) on a 19.3.0 WC on garfish, then upgrade to v20 and then run the migration-cli tool. In an effort to not have to redo all this again, maybe we can setup a template to set as much as possible up? I'm pretty confident the migration will break apps that use irsa like Loki so that's all the more intesting to test it |
@giantswarm/team-honeybadger I'm not sure why this happened during the migration phase but the loki app that i had deployed before the migration was renamed to oki on the workload cluster on gazelle: Apps on the MC: Could you investigate why ? Generated app:
|
This is the issue that is happening for the chart-operator-externsions: |
@T-Kukawka once the loki -> oki issue is fixed on honeybadger side, then I think atlas would only have to redo the tests with Loki and those 2 items: The fluent-logshipping-app change will be the main issue I think |
Second issue for @giantswarm/team-honeybadger but user-values configmap are not transfered when they are set on a default app. on my garfish WC, I have this set by the cluster-operator using the app.kubernetes.io/name=observability-bundle userConfig:
configMap:
name: atlastest-observability-bundle-user-values
namespace: atlastest But on the gazelle MC, this is rendered without the uservalues configmap spec:
catalog: default
config:
configMap:
name: atlastest-cluster-values
namespace: org-capa-migration-testing
secret:
name: ""
namespace: ""
extraConfigs:
- kind: configMap
name: psp-removal-patch
namespace: org-capa-migration-testing
priority: 150
- kind: configMap
name: atlastest-observability-bundle-logging-extraconfig
namespace: org-capa-migration-testing
priority: 25
- kind: configMap
name: psp-removal-patch
namespace: org-capa-migration-testing
priority: 150
install: {}
kubeConfig:
context:
name: ""
inCluster: true
secret:
name: ""
namespace: ""
name: observability-bundle
namespace: org-capa-migration-testing
namespaceConfig: {}
rollback: {}
uninstall: {}
upgrade: {}
userConfig:
configMap:
name: ""
namespace: ""
secret:
name: ""
namespace: ""
version: 1.2.1 I would expect them to be added to the app or to the default-apps-aws user values but it is empty: k get cm -n org-capa-migration-testing atlastest-default-apps-userconfig -oyaml
apiVersion: v1
data:
values: |
clusterName: atlastest
organization: capa-migration-testing
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"values":"clusterName: atlastest\norganization: capa-migration-testing\n"},"kind":"ConfigMap","metadata":{"annotations":{},"creationTimestamp":null,"labels":{"giantswarm.io/cluster":"atlastest"},"name":"atlastest-default-apps-userconfig","namespace":"org-capa-migration-testing"}}
creationTimestamp: "2024-02-13T21:09:53Z"
labels:
app-operator.giantswarm.io/watching: "true"
giantswarm.io/cluster: atlastest
name: atlastest-default-apps-userconfig
namespace: org-capa-migration-testing
resourceVersion: "165069994"
uid: 867a30ee-713b-4338-8c08-168389f9c5e6 |
Hey @QuentinBisson. @nce who dealt with migration is on vacations, but according to my knowledge the migration of default apps is no on us. If I get it right, the Observability Bundle, as part of the Default Apps app, should get configured by the CAPI migration CLI. |
giantswarm/roadmap#3209 (comment) Refactored code for better testing; added regression test
* Fix app naming bug bc/ of wrong trimming giantswarm/roadmap#3209 (comment) Refactored code for better testing; added regression test * refactor
Loki migration works fine but I could not test user values. I will run it tomorrow. User-values for default apps have been successfully migrated. Once loki test have been run, then all that;s left is to release a new Keda app version to support kubernetes 1.25 and add irsa support to fluent-logshipping-app |
I have made adjustments in the tracking ticket as well as the teams tickets regarding the CAPA and migration testing instructions. TL;DR: Testing of CAPA/Migration is moved from Initially We do recognise the pages and also actively work on testing, hence such pages are just a distraction away from the operations clusters that most of the teams have migrated the GS production workloads on. Taking all the facts into consideration we have decided that it would be best to move the testing to |
All our apps have been tested. Now we need to close #3249 and https://github.com/giantswarm/giantswarm/issues/29861 and we are done |
Prometheus-operator with a PV and changed namespace have been tested |
All we have left is about keda |
So Keda also supports up to 1.25, let's discuss when we need 1.26 support |
The time has come to start testing final releases as well as migration from Vintage
v20
to CAPA. We have created a dedicated Vintage MCgarfish
to perform any vintage or migration testing for stability purposes. The dedicated CAPA cluster for migration will be the CAPA stable-testing MCgrizzly
.We would kindly ask all teams to perform comprehensive tests for 3 use-cases, ordered in terms of priorities if they can't be performed all at once.
1. Vintage
AWS v20
Cluster creation on
garfish
-giantswarm
OrganizationThis is the last release of Vintage containing
1.25 k8s
. The1.25
kubernetes introduces a breaking change in terms of removalPSPs
from its API, meaning that all workloads will have to comply with theglobal
toggle disablingPSPs
as in19.3.x
release. Prior to makingv20
release available to customers, we need to validate that all applications are running smoothly. The Vintage tests are standard as always - you just create thev20
release and validate your applications. Separate stable MC in this case will guarantee no manual changes in the release and stability.v20
testing (please mark it in main issue as well)2. CAPA
0.60.0
Cluster creation on
grizzly
-giantswarm
Organization - be aware that this is production MC, so it will page everyone. In practice any CAPA MC should work for this test.Starting with
cluster-aws
-v0.60.0
anddefault-apps-aws
-v0.45.1
onwards CAPA supports Kubernetes 1.25 with all needed features to run our workloads in the same manner as on VIntage clusters. Please for testing use always latestcluster-aws
as well asdefault-apps-aws
releases.3. Vintage to CAPA migration
Cluster creation for migration on
garfish
-capa-migration-testing
Organization. Clusters will be migrated togrizzly
-capa-migration-testing
Organization.Phoenix and Honeybadger worked extensively on making the migration as smooth as possible. The
migration-cli
has been introduced that orchestrates migration of apps as well as infrastructure. Here the main point is to discover if your application and any custom configurations that could be applied by customers are migrated properly.The
migration-cli
has been extended to facilitate easy testing for all teams ad Giant Swarm. Please follow the requirements as well as the procedure that is described in thetests
section of the tool. In case of any issue with infrastructure - ping Phoenix, if the app/configmap migration will face any issues or inconsistencies - ping Honeybadger.The text was updated successfully, but these errors were encountered: