diff --git a/tests/fixture/bootstrapmonitor/README.md b/tests/fixture/bootstrapmonitor/README.md index 95b6ca442a5a..c4b33dbcee8a 100644 --- a/tests/fixture/bootstrapmonitor/README.md +++ b/tests/fixture/bootstrapmonitor/README.md @@ -31,8 +31,9 @@ To enable, supply `state-sync-enabled: false` as C-Chain configuration. ## Overview -The intention of `bootstrap-monitor` is to enable a `StatefulSet` to -perform continous bootstrap testing for a given avalanchego +The intention of `bootstrap-monitor` is to enable a Kubernetes +[StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) +to perform continous bootstrap testing for a given avalanchego configuration. It ensures that a testing pod either starts or resumes a test, and upon completion of a test, polls for a new image to test and initiates a new test when one is found. @@ -62,13 +63,17 @@ and initiates a new test when one is found. - Both the `init` and `wait-for-completion` commands of the `bootstrap-monitor` attempt to read serialized test details (namely the image used for the test and the start time of the test) from - the same data volume used by the avalanchego node. - - The `bootstrap-monitor init` command is intended to run as as the - an init container of an avalanchego node and ensure that the ID of - the image and its associated versions are recorded for the test and - that the contents of the pod's data volume is either cleared for a - new test or retained to enable resuming a previously started - test. It accomplishes this by: + the same data volume used by the avalanchego node. These details + are written by the `init` command when it determines that new test + is starting. + - The `bootstrap-monitor init` command is intended to run as as an + [init + container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) + of an avalanchego node and ensure that the ID of the image and its + associated versions are recorded for the test and that the contents + of the pod's data volume is either cleared for a new test or + retained to enable resuming a previously started test. It + accomplishes this by: - Mounting the same data volume as the avalanchego node - Reading bootstrap test configuration as described previously - Determining the image ID and versions for an image if the @@ -77,26 +82,25 @@ and initiates a new test when one is found. `StatefulSet` runs. Subsequent pods from the same `StatefulSet` should have an image qualified with its SHA and version details set by the previous test run's `wait-for-completion` pod. - - A new pod will be started with with the `latest` image to - execute `avalanchego --versions-json` to determine the exact - version of the image and update the `StatefulSet` managing the - pod which will prompt a pod restart. This ensures both that a - test result can be associated with a specific image SHA and the - avalanchego versions (including commit hash) of the binary that - the image provides. + - A new pod will be started with the `latest` image to execute + `avalanchego --versions-json` to determine the image ID (which + includes a sha256 hash) of the image and its avalanchego + versions. Those values will then be applied to the `StatefulSet` + managing the pod which will prompt pod deletion and recreation + with the updated values. This ensures that a test result can be + associated with both a specific image SHA and the avalanchego + versions (including commit hash) of the binary that the image + provides. - A separate pod is used because the image ID of a non-init avalanchego container using a `latest`-tagged image is only available when that container runs rather than when an init container runs. - While it would be possible to add an init container running the same avalanchego image as the primary avalanchego container, - have it run the version command, and then have the - `bootstrap-monitor init` container read those results, the - method of discoverying the versions and image of the - avalanchego image currently tagged with `latest` would still be - required by the `wait-for-completion` command (described in a - subsequent section) to enable discovery of a new image to - test. It seemed preferable to have only a single way to - discover image details. + have it run the version command, and then have a subsequent + `bootstrap-monitor init` container read those results, the use + of a separate pod for SHA and versions discovery would still be + required by the `wait-for-completion` command. It seemed + preferable to have only a single way to discover image details. - Attempting to read the serialized test details from a file on the data volume. This file will not exist if the data volume has not been used before. @@ -105,38 +109,39 @@ and initiates a new test when one is found. - If the images differ (or the file was not present), the data volume is initialized for a new test: - The data volume is cleared - - The image from the test configuration and and time are written to the data volume + - The image from the test configuration and and time are + serialized to a file on the data volume - If the images are the same, the data volume is used as-is to - enable resuming the in-progress test. + enable resuming an in-progress test. - `bootstrap-monitor wait-for-completion` is intended to run as a sidecar of the avalanchego container. It polls the health of the node container to detect when a bootstrap test has completed successfully, then polls for a new image to test and when one is - found, updates the managing `StatefulSet` with that image to - trigger the start of a new test. The process to detect a new image - is the same as was described for the `init` command. + found, updates the managing `StatefulSet` with the details of that + image to trigger a new test. The process to detect a new image is + the same as was described for the `init` command. ## Package details -| Filename | Purpose | -|:-------------------------|:-----------------------------------------------------------------------| -| bootstrap_test_config.go | Defines how the configuration for a bootstrap test is read from a pod. | -| common.go | Defines code common between init and wait | -| init.go | Defines how a bootstrap test is initialized | -| wait.go | Defines the loop that waits for completion of a bootstrap test | -| cmd/main.go | The binary entrypoint for the bootstrap-monitor | -| e2e/e2e_test.go | The e2e test that validates the bootstrap-monitor | +| Filename | Purpose | +|:-------------------------|:--------------------------------------------------------------------------------------------| +| bootstrap_test_config.go | Defines how the configuration for a bootstrap test is read from a pod. | +| common.go | Defines code common between init and wait | +| init.go | Defines how a bootstrap test is initialized | +| wait.go | Defines how a bootstrap test is determined to have completed and how a new one is initiated | +| cmd/main.go | The binary entrypoint for the `bootstrap-monitor` | +| e2e/e2e_test.go | The e2e test that validates `bootstrap-monitor` | ## Supporting files | Filename | Purpose | |:-----------------------------------------|:--------------------------------------------------| -| scripts/build_bootstrap_monitor.sh | Builds the bootstrap-monitor binary | -| scripts/build_bootstrap_monitor_image.sh | Builds the image for the bootstrap-monitor | -| scripts/tests.e2e.bootstrap_monitor.go | Script for running the bootstrap-monitor e2e test | +| scripts/build_bootstrap_monitor.sh | Builds the `bootstrap-monitor` binary | +| scripts/build_bootstrap_monitor_image.sh | Builds the image for the `bootstrap-monitor` | +| scripts/tests.e2e.bootstrap_monitor.go | Script for running the `bootstrap-monitor` e2e test | - - The test script is used by the primary github action workflow to - validate the `bootstrap-monitor` binary and image. + - The test script is used by the github action workflow that + validates the `bootstrap-monitor` binary and image. - The image build script is used by the github action workflow that publishes repo images post-merge. @@ -160,8 +165,8 @@ image with a `latest` tag, the pod would continously bootstrap, exit, and restart with the current latest image. While appealingly simple, this approach doesn't directly support: - - a mechanism for resuming a long-running bootstrap. Given the + - A mechanism for resuming a long-running bootstrap. Given the expected duration of a bootstrap test, and the fact that a workload on Kubernetes is not guaranteed to run without interruption, a separate init process is suggested to enable resumption of an interrupted test. -- a mechanism for reporting disk usage and duration of execution +- A mechanism for reporting disk usage and duration of execution