Skip to content

Commit

Permalink
fixup: More README cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
marun committed Sep 20, 2024
1 parent 054e021 commit be20cf9
Showing 1 changed file with 49 additions and 44 deletions.
93 changes: 49 additions & 44 deletions tests/fixture/bootstrapmonitor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@ To enable, supply `state-sync-enabled: false` as C-Chain configuration.

## Overview

The intention of `bootstrap-monitor` is to enable a `StatefulSet` to
perform continous bootstrap testing for a given avalanchego
The intention of `bootstrap-monitor` is to enable a Kubernetes
[StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
to perform continous bootstrap testing for a given avalanchego
configuration. It ensures that a testing pod either starts or resumes
a test, and upon completion of a test, polls for a new image to test
and initiates a new test when one is found.
Expand Down Expand Up @@ -62,13 +63,17 @@ and initiates a new test when one is found.
- Both the `init` and `wait-for-completion` commands of the
`bootstrap-monitor` attempt to read serialized test details (namely
the image used for the test and the start time of the test) from
the same data volume used by the avalanchego node.
- The `bootstrap-monitor init` command is intended to run as as the
an init container of an avalanchego node and ensure that the ID of
the image and its associated versions are recorded for the test and
that the contents of the pod's data volume is either cleared for a
new test or retained to enable resuming a previously started
test. It accomplishes this by:
the same data volume used by the avalanchego node. These details
are written by the `init` command when it determines that new test
is starting.
- The `bootstrap-monitor init` command is intended to run as as an
[init
container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
of an avalanchego node and ensure that the ID of the image and its
associated versions are recorded for the test and that the contents
of the pod's data volume is either cleared for a new test or
retained to enable resuming a previously started test. It
accomplishes this by:
- Mounting the same data volume as the avalanchego node
- Reading bootstrap test configuration as described previously
- Determining the image ID and versions for an image if the
Expand All @@ -77,26 +82,25 @@ and initiates a new test when one is found.
`StatefulSet` runs. Subsequent pods from the same `StatefulSet`
should have an image qualified with its SHA and version details
set by the previous test run's `wait-for-completion` pod.
- A new pod will be started with with the `latest` image to
execute `avalanchego --versions-json` to determine the exact
version of the image and update the `StatefulSet` managing the
pod which will prompt a pod restart. This ensures both that a
test result can be associated with a specific image SHA and the
avalanchego versions (including commit hash) of the binary that
the image provides.
- A new pod will be started with the `latest` image to execute
`avalanchego --versions-json` to determine the image ID (which
includes a sha256 hash) of the image and its avalanchego
versions. Those values will then be applied to the `StatefulSet`
managing the pod which will prompt pod deletion and recreation
with the updated values. This ensures that a test result can be
associated with both a specific image SHA and the avalanchego
versions (including commit hash) of the binary that the image
provides.
- A separate pod is used because the image ID of a non-init
avalanchego container using a `latest`-tagged image is only
available when that container runs rather than when an init container runs.
- While it would be possible to add an init container running the
same avalanchego image as the primary avalanchego container,
have it run the version command, and then have the
`bootstrap-monitor init` container read those results, the
method of discoverying the versions and image of the
avalanchego image currently tagged with `latest` would still be
required by the `wait-for-completion` command (described in a
subsequent section) to enable discovery of a new image to
test. It seemed preferable to have only a single way to
discover image details.
have it run the version command, and then have a subsequent
`bootstrap-monitor init` container read those results, the use
of a separate pod for SHA and versions discovery would still be
required by the `wait-for-completion` command. It seemed
preferable to have only a single way to discover image details.
- Attempting to read the serialized test details from a file on the
data volume. This file will not exist if the data volume has not
been used before.
Expand All @@ -105,38 +109,39 @@ and initiates a new test when one is found.
- If the images differ (or the file was not present), the data
volume is initialized for a new test:
- The data volume is cleared
- The image from the test configuration and and time are written to the data volume
- The image from the test configuration and and time are
serialized to a file on the data volume
- If the images are the same, the data volume is used as-is to
enable resuming the in-progress test.
enable resuming an in-progress test.
- `bootstrap-monitor wait-for-completion` is intended to run as a
sidecar of the avalanchego container. It polls the health of the
node container to detect when a bootstrap test has completed
successfully, then polls for a new image to test and when one is
found, updates the managing `StatefulSet` with that image to
trigger the start of a new test. The process to detect a new image
is the same as was described for the `init` command.
found, updates the managing `StatefulSet` with the details of that
image to trigger a new test. The process to detect a new image is
the same as was described for the `init` command.

## Package details

| Filename | Purpose |
|:-------------------------|:-----------------------------------------------------------------------|
| bootstrap_test_config.go | Defines how the configuration for a bootstrap test is read from a pod. |
| common.go | Defines code common between init and wait |
| init.go | Defines how a bootstrap test is initialized |
| wait.go | Defines the loop that waits for completion of a bootstrap test |
| cmd/main.go | The binary entrypoint for the bootstrap-monitor |
| e2e/e2e_test.go | The e2e test that validates the bootstrap-monitor |
| Filename | Purpose |
|:-------------------------|:--------------------------------------------------------------------------------------------|
| bootstrap_test_config.go | Defines how the configuration for a bootstrap test is read from a pod. |
| common.go | Defines code common between init and wait |
| init.go | Defines how a bootstrap test is initialized |
| wait.go | Defines how a bootstrap test is determined to have completed and how a new one is initiated |
| cmd/main.go | The binary entrypoint for the `bootstrap-monitor` |
| e2e/e2e_test.go | The e2e test that validates `bootstrap-monitor` |

## Supporting files

| Filename | Purpose |
|:-----------------------------------------|:--------------------------------------------------|
| scripts/build_bootstrap_monitor.sh | Builds the bootstrap-monitor binary |
| scripts/build_bootstrap_monitor_image.sh | Builds the image for the bootstrap-monitor |
| scripts/tests.e2e.bootstrap_monitor.go | Script for running the bootstrap-monitor e2e test |
| scripts/build_bootstrap_monitor.sh | Builds the `bootstrap-monitor` binary |
| scripts/build_bootstrap_monitor_image.sh | Builds the image for the `bootstrap-monitor` |
| scripts/tests.e2e.bootstrap_monitor.go | Script for running the `bootstrap-monitor` e2e test |

- The test script is used by the primary github action workflow to
validate the `bootstrap-monitor` binary and image.
- The test script is used by the github action workflow that
validates the `bootstrap-monitor` binary and image.
- The image build script is used by the github action workflow that
publishes repo images post-merge.

Expand All @@ -160,8 +165,8 @@ image with a `latest` tag, the pod would continously bootstrap, exit,
and restart with the current latest image. While appealingly simple,
this approach doesn't directly support:

- a mechanism for resuming a long-running bootstrap. Given the
- A mechanism for resuming a long-running bootstrap. Given the
expected duration of a bootstrap test, and the fact that a workload on
Kubernetes is not guaranteed to run without interruption, a separate
init process is suggested to enable resumption of an interrupted test.
- a mechanism for reporting disk usage and duration of execution
- A mechanism for reporting disk usage and duration of execution

0 comments on commit be20cf9

Please sign in to comment.