Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cron job maintenance #448

Merged
merged 30 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
4138130
Remove spaces that makes helm lint unhappy
ksuderman Feb 9, 2024
e0b960d
First pass at defining cron jobs in the values.yaml file.
ksuderman Feb 9, 2024
9c84d17
Fix command formatting and quoting
ksuderman Feb 13, 2024
d2941d7
Remove nodeSelector
ksuderman Feb 13, 2024
95474e2
Tmp cleanup and maintenance scripts are treated as special cases.
ksuderman Feb 16, 2024
93930d7
Update the name of the container used to run the maintenance script.
ksuderman Feb 16, 2024
287cf13
Define all cron jobs in the values.yaml file again, but allows time d…
ksuderman Feb 19, 2024
8225be5
Use walltime limit for cleanup and other minor tweaks
nuwang Feb 20, 2024
8cc5d94
Change find units from seconds to days
nuwang Feb 20, 2024
64320c8
Parameterize the nodeSelector
ksuderman Feb 21, 2024
2bbaa3a
Remove the chown cron job
ksuderman Feb 21, 2024
df1fb39
Reverting my revert
ksuderman Feb 21, 2024
247cbf5
Add extraFileMappings for cron jobs
ksuderman Feb 28, 2024
cf77963
Start documenting cron jobs
ksuderman Feb 28, 2024
13e4f0b
Make the galaxy.yml file available in a configmap for the maintenance…
ksuderman Mar 8, 2024
27fd304
Update galaxy/values.yaml
ksuderman Mar 8, 2024
42488ff
Add helper to calculate the postgres connection string
ksuderman Mar 10, 2024
0fafea8
Add cron job documentation and remove Galaxy versions section
ksuderman Mar 10, 2024
497b572
Add env definitions
ksuderman Mar 12, 2024
da6a97a
Maintenace job should include default env vars
ksuderman Mar 12, 2024
cdc2a98
Update galaxy/values.yaml
ksuderman Mar 12, 2024
63bc997
Update galaxy/values.yaml
ksuderman Mar 12, 2024
d96979a
Parameterize Docker image for cron jobs
ksuderman Mar 12, 2024
3af24cd
Allow mode (permissions) to be defined on extraFileMappings
ksuderman Mar 13, 2024
3fe5723
Add example cron job
ksuderman Mar 13, 2024
07fdbd4
Additional documentation for cron jobs
ksuderman Mar 13, 2024
b4cbdf6
Comment out the example cron job. Examples should not add arbitrary c…
ksuderman Mar 14, 2024
db82ef0
Run extraEnv values through the template engine
ksuderman Mar 17, 2024
a7c1aed
Merge branch 'master' into 408-maintenance
ksuderman Apr 18, 2024
14f190d
Add newline to the end of the file.
ksuderman Apr 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 71 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,8 @@ jobHandlers:
failureThreshhold: 3
```

# Additional Configurations

## Extra File Mappings

The `extraFileMappings` field can be used to inject files to arbitrary paths in the `nginx` deployment, as well as any of the `job`, `web`, or `workflow` handlers, and the `init` jobs.
Expand Down Expand Up @@ -420,21 +422,78 @@ The Galaxy application can be horizontally scaled for the web, job, or workflow
by setting the desired values of the `webHandlers.replicaCount`,
`jobHandlers.replicaCount`, and `workflowHandlers.replicaCount` configuration options.

## Galaxy versions
## Cron jobs

Two Cron jobs are defined by default. One to clean up Galaxy's database and one to clean up the `tmp` directory. By default, these
jobs run at 02:05 (the database maintenance script) and 02:15 (`tmp` directyory cleanup). Users can
change the times the cron jobs are run by changing the `schedule` field in the `values.yaml` file:

```yaml
cronJobs:
maintenance:
schedule: "30 6 * * *" # Execute the cron job at 6:30 UTC
```
or by specifying the `schedule` on the command line when instaling Galaxy:
```bash
# Schedule the maintenance job to run at 06:30 on the first day of each month
helm install galaxy -n galaxy galaxy/galaxy --set cronJobs.maintenance.schedule="30 6 1 * *"
```
To disable a cron job after Galaxy has been deployed simply set the schedule to a date that
can never occur such as midnight on Februrary 30th:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add an enabled flag as we have for everything else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but in other cases when enabled: false is set we don't render the template. However, in this case it means the CronJob is not defined and not available to be run manually. By setting the schedule to a time that never occurs the CronJob is defined and is available to be run manually if desired. At least that was my thought process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading more about cron jobs, maybe I should be using .spec.suspend here. I'll play around with that and see if I get the desired behavior.

Copy link
Member

@nuwang nuwang Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to investigate suspend. Maybe we should add both. Some instructions on how to manually invoke the cronjob would also be useful.

Copy link
Contributor Author

@ksuderman ksuderman Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .spec.suspend looks to be less than ideal, particularly if one wants to run the job once but keep it suspended. In that case the user must:

  1. Patch the cron job to set .spec.suspend=false
  2. Run the job if it does not get triggered automatically, and
  3. Patch the cron job again to re-set .spec.suspend=true

Given users can always disable a cron job with the schedule I am not sure .spec.suspend is worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding an enabled flag would still be useful. Users who want to run the cronjob manually can still use the trick you suggested. Just that it feels weird to have to think of an invalid date for the simplest case of not wanting the cron job.



```bash
helm upgrade galaxy -n galaxy galaxy/galaxy --reuse-values --set cronJobs.maintenance.schedule="0 0 30 2 *"
```

### Run a CronJob manually

Cron jobs can be invoked manually with tools such as [OpenLens](https://github.com/MuhammedKalkan/OpenLens)
or from the command line with `kubectl`
```bash
kubectl create job --namespace <namespace> <job name> --from cronjob/galaxy-cron-maintenance
```
This will run the cron job regardless of the `schedule` that has been set.

**Note:** the name of the cron job will be `{{ .Release.Name }}-cron-<job name>` where the `<job name>`
is the name (key) used in the `values.yaml` file.

### CronJob configuration

The following fields can be specified when defining cron jobs.

| Name | Definition | Required |
|---|-------------------------------------------------------------------------------------------------------------------------------------------|----------|
| schedule | When the job will be run. Use tools such as [crontab.guru](https://crontab.guru) for assistance determining the proper schedule string | **Yes** |
| defaultEnv | `true` or `false`. See the `galaxy.podEnvVars` macro in `_helpers.tpl` for the list of variables that will be defined. Default is `false` | No |
| extraEnv | Define extra environment variables that will be available to the job | No |
| securityContext | Specifies a `securityContext` for the job. Typically used to set `runAsUser` | No |
| image | Specify the Docker container used to run the job | No |
| command | The command to run | **Yes** |
| args | Any command line arguments that should be passed to the `command` | No |
| extraFileMappings | Allow arbitrary files to be mounted from config maps | No |

### Notes

If specifying the Docker `image` both the `resposity` and `tag` MUST be specified.
```yaml
image:
repository: quay.io/my-organization/my-image
tag: "1.0"
```

The `extraFileMappings` block is similar to the global `extraFileMappings` except the file will only be mounted for that cron job.
The following fields can be specified for each file.

| Name | Definition | Required |
|---|---|----------|
| mode | The file mode (permissions) assigned to the file | No |
| tpl | If set to `true` the file contents will be run through Helm's templating engine. Defaults to `false` | No |
| content | The contents of the file | **Yes** |

Some changes introduced in the chart sometimes rely on changes in the Galaxy
container image, especially in relation to the Kubernetes runner. This table
keeps track of recommended Chart versions for particular Galaxy versions as
breaking changes are introduced. Otherwise, the Galaxy image and chart should be
independently upgrade-able. In other words, upgrading the Galaxy image from
`21.05` to `21.09` should be a matter of `helm upgrade my-galaxy cloudve/galaxy
--reuse-values --set image.tag=21.09`.

See the `example` cron job included in the `values.yaml` file for a full example.

| Chart version | Galaxy version | Description |
| :------------------ | :--------------- | :-------------- |
| `5.0` | `22.05` | Needs at least container image 22.05 as Galaxy switched from uwsgi to gunicorn |
| `4.0` | `21.05` | Needs [Galaxy PR#11899](https://github.com/galaxyproject/galaxy/pull/11899) for eliminating the CVMFS. If running chart 4.0+ with Galaxy image `21.01` or below, use the CVMFS instead with `--set setupJob.downloadToolConfs.enabled=false --set cvmfs.repositories.cvmfs-gxy-cloud=cloud.galaxyproject.org --set cvmfs.galaxyPersistentVolumeClaims.cloud.storage=1Gi --set cvmfs.galaxyPersistentVolumeClaims.cloud.storageClassName=cvmfs-gxy-cloud --set cvmfs.galaxyPersistentVolumeClaims.cloud.mountPath=/cvmfs/cloud.galaxyproject.org` |

## Funding

Expand Down
9 changes: 9 additions & 0 deletions galaxy/disabled/configmap-galaxy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
metadata:
name: {{ .Release.Name }}-galaxy-config
labels:
{{- include "galaxy.labels" $ | nindent 4 }}
kind: ConfigMap
data:
galaxy.yml: |
{{- .Values.galaxy | toYaml | nindent 4 }}
7 changes: 7 additions & 0 deletions galaxy/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,13 @@ Return the postgresql database name to use
{{- end -}}
{{- end -}}

{{/*
Generate the connection string needed to connect to a Postres database
*/}}
{{- define "galaxy-postgresql.connection-string" -}}
{{- printf "postgresql://%s:%s@%s/galaxy%s" .Values.postgresql.galaxyDatabaseUser (include "galaxy.galaxyDbPassword" .) (include "galaxy-postgresql.fullname" .) .Values.postgresql.galaxyConnectionParams -}}
{{- end -}}

{{/*
Return the rabbitmq cluster to use
*/}}
Expand Down
112 changes: 88 additions & 24 deletions galaxy/templates/cronjob-maintenance.yaml
Original file line number Diff line number Diff line change
@@ -1,48 +1,112 @@
{{ range $key, $cronjob := .Values.cronJobs }}
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: {{ include "galaxy.fullname" . }}-maintenance
name: {{ include "galaxy.fullname" $ }}-cron-{{ $key }}
labels:
{{- include "galaxy.labels" . | nindent 4 }}
{{- include "galaxy.labels" $ | nindent 4 }}
spec:
schedule: "0 2 * * *"
schedule: {{ $cronjob.schedule | quote }}
jobTemplate:
spec:
template:
spec:
{{- if $cronjob.securityContext }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
{{- with .Values.nodeSelector }}
{{- toYaml $cronjob.securityContext | nindent 12 }}
{{- end}}
{{- if $cronjob.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 16 }}
{{- toYaml $cronjob.nodeSelector | nindent 12 }}
{{- else if $.Values.nodeSelector }}
nodeSelector:
{{- toYaml $.Values.nodeSelector | nindent 12 }}
{{- end }}
containers:
- name: galaxy-maintenance
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
# delete all tmp files older than walltime limit
- name: galaxy-cron-{{ $key }}
{{- if $cronjob.image }}
image: {{ $cronjob.image.repository }}:{{ $cronjob.image.tag }}
{{- else }}
image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag }}"
{{- end }}
imagePullPolicy: {{ $.Values.image.pullPolicy }}
{{- if or $cronjob.defaultEnv $cronjob.extraEnv }}
env:
{{- if $cronjob.defaultEnv }}
{{- include "galaxy.podEnvVars" $}}
{{- end }}
{{- if $cronjob.extraEnv }}
{{- range $env := $cronjob.extraEnv }}
- name: {{ $env.name }}
value: {{ tpl $env.value $ | quote }}
{{- end }}
{{- end }}
{{- end }}
command:
- find
- {{ .Values.persistence.mountPath }}/tmp
- '!'
- -newermt
- -{{ (index .Values "configs" "job_conf.yml" "runners" "k8s" "k8s_walltime_limit" | default 604800) }} seconds
- -type
- f
- -exec
- rm
- '{}'
- ;
{{- range $cmd := $cronjob.command }}
- {{ tpl $cmd $ | quote }}
{{- end}}
{{- if $cronjob.args }}
args:
{{- range $arg := $cronjob.args }}
- {{ tpl $arg $ | quote }}
{{- end }}
{{- end }}
volumeMounts:
- name: galaxy-data
mountPath: {{ .Values.persistence.mountPath }}
mountPath: {{ $.Values.persistence.mountPath }}
{{- range $key, $entry := $cronjob.extraFileMappings }}
- name: {{ include "galaxy.getExtraFilesUniqueName" $key }}
mountPath: {{ $key }}
subPath: {{ include "galaxy.getFilenameFromPath" $key }}
{{- end }}
volumes:
- name: galaxy-data
{{- if .Values.persistence.enabled }}
{{- if $.Values.persistence.enabled }}
persistentVolumeClaim:
claimName: {{ template "galaxy.pvcname" . }}
claimName: {{ template "galaxy.pvcname" $ }}
{{- else }}
emptyDir: {}
{{- end }}
{{- range $key, $entry := $cronjob.extraFileMappings }}
- name: {{ include "galaxy.getExtraFilesUniqueName" $key }}
{{- if $entry.useSecret }}
secret:
secretName: {{ printf "%s-%s" (include "galaxy.fullname" $) (include "galaxy.getExtraFilesUniqueName" $key) }}
{{- else }}
configMap:
name: {{ printf "%s-%s" (include "galaxy.fullname" $) (include "galaxy.getExtraFilesUniqueName" $key) }}
{{- end }}
{{- if $entry.mode }}
defaultMode: {{ $entry.mode }}
{{- end }}
{{- end }}
restartPolicy: OnFailure
{{- if $cronjob.extraFileMappings }}
{{- range $name, $entry := $cronjob.extraFileMappings }}
---
apiVersion: v1
metadata:
# Extract the filename portion only
ksuderman marked this conversation as resolved.
Show resolved Hide resolved
name: {{ printf "%s-%s" (include "galaxy.fullname" $) (include "galaxy.getExtraFilesUniqueName" $name) }}
labels:
{{- include "galaxy.labels" $ | nindent 4 }}
{{- if $entry.useSecret }}
kind: Secret
type: Opaque
stringData:
{{- else }}
kind: ConfigMap
data:
{{- end }}
{{- include "galaxy.getFilenameFromPath" $name | nindent 2 }}: |
{{- if $entry.tpl }}
{{- tpl (tpl $entry.content $) $ | nindent 4 }}
{{- else }}
{{- $entry.content | nindent 4 }}
{{- end }}
{{- end }}
{{- end }}

{{- end }}
2 changes: 1 addition & 1 deletion galaxy/templates/hook-cvmfs-fix.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- if and .Values.refdata.enabled (eq .Values.refdata.type "cvmfs") }}
# Include the code you want to run when both conditions are met
---
# Include the code you want to run when both conditions are met
apiVersion: batch/v1
kind: Job
metadata:
Expand Down
86 changes: 84 additions & 2 deletions galaxy/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -267,10 +267,91 @@ extraEnv: []
# - name: EXAMPLE_ENV
# value: MY_VALUE

#- CronJobs to perform periodic maintenance tasks
cronJobs:
#- Runs the maintenance.sh script to purge items in the Galaxy database that
#- have been flagged as deleted.
maintenance:
schedule: "5 2 * * *"
extraSettings:
#- Purge items older than this.
days: '7'
securityContext:
runAsUser: 0
defaultEnv: true
command:
- "/galaxy/server/scripts/maintenance.sh"
args:
- "--no-dry-run"
- "--days"
- "{{ tpl .Values.cronJobs.maintenance.extraSettings.days $ }}"
#- Remove files from the tmp directory that are older than the allowable wall time for a job
tmpdir:
schedule: "15 2 * * *"
extraSettings:
lastModified: '{{ index .Values "configs" "job_conf.yml" "runners" "k8s" "k8s_walltime_limit" | default 604800 }}'
securityContext:
runAsUser: 0
command:
- /usr/bin/find
args:
- "{{ .Values.persistence.mountPath }}/tmp"
- "!"
- "-newermt"
- "{{ tpl .Values.cronJobs.tmpdir.extraSettings.lastModified $ }} seconds ago"
- "-type"
- "f"
- "-exec"
- "rm"
- "{}"
- ";"
# #- An example cron job that showcases all available features.
# example:
# #- Disable the job by scheduling it for a date that never occurs, I.E. Feb 30th
# #- The job can still be triggered manually.
# schedule: "0 0 30 2 *"
# #- Include the set of default environment variables. See galaxy.podEnvVars
# #- in the Helm chart's _helpers.tpl for the variables that will be defined.
# defaultEnv: true
# #- Define extra environment variables that will be available to the job
# extraEnv:
# - name: LOGFILE
# value: /galaxy/server/database/example.log
# #- Run the job as root (uid 0)
# securityContext:
# runAsUser: 0
# #- Specify an alternate Docker image for the CronJob container
# image:
# repository: ksuderman/galaxy-maintenance
# tag: "0.7"
# #- The command to be run
# command:
# - /usr/local/bin/example.sh
# #- Command line arguments to be passed to the command, one per line.
# args:
# - "--option"
# - "value"
# #- Define extra files that will be mounted into the image. In this case we
# #- mount a simple Bash script that will write the current environment
# #- variables to persistent storage.
# extraFileMappings:
# #- Path were the file will be mounted
# /usr/local/bin/example.sh:
# #- Default permission on the file. In this case 'rwxr-xr-x'
# mode: "0755"
# #- Run the contents through the Helm `tpl` command
# tpl: true
# #- The contents of the file to be mounted. Can contain Helm template values
# #- if `tpl` is set to true.
# content: |-
# #!/usr/bin/bash
# echo {{ .Release.Name }} >> $LOGFILE
# echo "$@" >> $LOGFILE
# env >> $LOGFILE

ingress:
#- Should ingress be enabled. Defaults to `true`
enabled: true
#-
ingressClassName: nginx
canary:
enabled: true
Expand Down Expand Up @@ -450,7 +531,8 @@ configs:
interactivetools_base_path: "{{$host := index .Values.ingress.hosts 0}}{{$path := index $host.paths 0}}{{$path.path}}"
id_secret:
mulled_resolution_cache_lock_dir: "/galaxy/server/local/mulled_cache_lock"
database_connection: postgresql://unused:because@overridden_by_envvar
database_connection: |-
{{ include "galaxy-postgresql.connection-string" .}}
ksuderman marked this conversation as resolved.
Show resolved Hide resolved
integrated_tool_panel_config: "/galaxy/server/config/mutable/integrated_tool_panel.xml"
sanitize_allowlist_file: "/galaxy/server/config/mutable/sanitize_allowlist.txt"
tool_config_file: "/galaxy/server/config/tool_conf.xml{{if .Values.setupJob.downloadToolConfs.enabled}},{{ .Values.setupJob.downloadToolConfs.volume.mountPath }}/config/shed_tool_conf.xml{{end}}"
Expand Down
Loading