-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
My gourd.toml file does not have a resource_limits section and jobs fail almost immediately due to an out-of-memory error.
Excerpt from the gourd.toml docs:
Example
An example Slurm Configuration:
[slurm]
experiment_name = "my test experiment"
output_folder = "./slurmout/"
partition = "compute"
account = "Education-EEMCS-MSc-CS"
RESOURCE LIMITS
To run on Slurm one must also specify resource limits.
The docs imply that the config is invalid without resource limits (and it is not clear if there are any defaults). There should be checks, as I was able to run the file on Slurm hence the OOM error.
Also, would be nice for the status UI to display the OOM status info line on the short-form gourd status instead of just an exit code.

Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working