Skip to content

Commit

Permalink
NOTIC: Keep more failed NCCL benchmark jobs in the history instead of…
Browse files Browse the repository at this point in the history
… successful ones
  • Loading branch information
rdjjke committed Jan 12, 2025
1 parent 77a1ce1 commit c9855b7
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions helm/slurm-cluster/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,9 @@ periodicChecks:
# CronJob timeout in seconds. By default, equals to 30 min
activeDeadlineSeconds: 1800
# Number of successful finished jobs to retain
successfulJobsHistoryLimit: 24
successfulJobsHistoryLimit: 3
# Number of failed finished jobs to retain
failedJobsHistoryLimit: 3
failedJobsHistoryLimit: 24
# NCCL test settings
ncclArguments:
# Minimum memory size to start NCCL with
Expand Down

0 comments on commit c9855b7

Please sign in to comment.