First off, thanks for taking the time to contribute!
The following is a set of guidelines for contributing to Red Hat
OpenShift PSAP ci-artifacts
. These are mostly guidelines, not
rules. Use your best judgment, and feel free to propose changes to
this document in a pull request.
---
The primary goal of the repository is to host the tools required for the nightly testing of the OpenShift operators under Red Hat PSAP team responsibility, and in particular, NVIDIA GPU Operator and the Special Resource Operator (SRO).
The secondary goal of the repository is to offer a toolbox for interacting with our operators, and configuring the cluster as required.
- Pull Requests (PRs) need to be
/approve
and reviewed/lgtm
by PSAP team members before being merged. - PRs should have a proper description explaining the problem being solved, or the new feature being introduced.
- PRs introducing or modifying
toolbox
commands should include a documentation commit, so thatdocs
is kept up-to-date.
- Reviews can be performed by anyone interested in the good health of
the repository; but approval and/or
/lgtm
is reserved to PSAP team members at the moment. - Reviewers should ensure that the relevant testing (only
/test gpu-operator-e2e
at the moment) has been successfully executing before the PR can be merged.- In order to save unnecessary AWS cloud time, the testing is not automatically executed by Prow; it must be manually triggered.
OpenShift GitHub Bot
will not merge a PR when thegpu-operator-e2e
test failed, but it will merged it if it was never executed (or if it completed successfully, of course)
- Align nested lists with their parent's label
- block:
- name: ...
block:
- name: ...
- YAML files use the .yml extension
We strive to follow Ansible best practices in the different playbooks.
This command is executed as a GitHub-Action hook on all the new PRs, to help keeping a consistent code style:
ansible-lint -v --force-color -c config/ansible-lint.yml playbooks roles
- Try to avoid using
shell
tasks as much as possible- Make sure that
set -o pipefail;
is part of the shell command whenever a|
is involved (ansible-lint
forgets some of them) - Redirection into a
{{ artifact_extra_logs_dir }}
file is a common exception
- Make sure that
- Use inline stanza for
debug
andfail
tasks, eg:
- name: The GFD did not label the nodes
fail: msg="The GFD did not label the nodes"
- Keep the main log file clean when everything goes right, and store
all the relevant information in the
{{ artifact_extra_logs_dir }}
directory, eg:
- name: Inspect the Subscriptions status (debug)
shell:
oc describe subscriptions.operators.coreos.com/gpu-operator-certified
-n openshift-operators
> {{ artifact_extra_logs_dir }}/gpu_operator_Subscription.log
failed_when: false
- Include troubleshooting inspection commands whenever
possible/relevant (see above for an example)
- mark them as
failed_when: false
to ensure that their execution doesn't affect the testing - add
(debug)
in the task name to make it clear that the command is not part of the proper testing.
- mark them as
- Use
ignore_errors: true
only for tracking known failures.- use
failed_when: false
to ignore the task return code - but whenever possible, write tasks that do not fail, eg:
- use
oc delete --ignore-not-found=true $MY_RESOURCE
- Try to group related modifications in a dedicated commit, and stack
commits in logical order (eg, 1/ add role, 2/ add toolbox script 3/
integrate the toolbox scrip in the nightly CI)
- Commits are not squashed, so please avoid commits "fixing" another commit of the PR.
- Hints: git revise
- use
git revise <commit>
to modify an older commit (not older thatmaster
;-) - use
git revise --cut <commit>
to split a commit in two logical commits - or simply use
git commit --amend
to modify the most recent commit
- use
- Duplicate the
template
role to prepare the skeleton the new role - The
gpu_operator_run_gpu-burn
role can be studied an example of a standalone role & toolbox script. New features should follow a similar model:
roles/gpu_operator_run_gpu-burn
- Define the tasks of the new role:
├── tasks
│ └── main.yml
- Define the role dependencies (at least
check_deps
):
├── meta
│ └── main.yml
- Define the role configuration variables and their default values:
├── defaults
│ └── main
│ └── config.yml
- Define the script constant variables
├── files
│ ├── gpu_burn_cm_entrypoint.yml
│ └── gpu_burn_pod.yml
└── vars
└── main
└── resources.yml
- Add a toolbox script entrypoint setting the role configuration variables
toolbox/gpu-operator/
└── run_gpu_burn.sh
- If relevant, call the toolbox script from the right nightly CI entrypoint:
# in build/root/usr/local/bin/ci_entrypoint_gpu-operator.sh
validate_gpu_operator_deployment() {
...
toolbox/gpu-operator/run_gpu_burn.sh
}