Skip to content
dkakkar edited this page Jan 29, 2020 · 1 revision

Slurm Workload Manager

  • Slurm is an open-source cluster management and job scheduling system for Linux clusters
  • Slurm is the workload manager on about 60% of the TOP500 supercomputers.
  • It performs the following key functions:
    • Allocates resource
    • Provides a framework for managing jobs
    • Resolves conflicts for resources
    • FASRC uses Slurm to manage workload on the Cannon cluster

Slurm scheduler

  • FASRC uses Slurm built-in job accounting and fairshare system to ensure that resources are used fairly
  • Every lab has a base Share of the community-wide system
  • Fairshare score of a lab is then calculated based off of their Share versus the amount of the cluster they have actually used
  • Fairshare score is then utilized to assign priority to their jobs relative to other users on the cluster
  • TRES: allows the scheduler to charge back users for how much they have used different features on the cluster
  • sshare: A tool that can be used to see your current fairshare
Clone this wiki locally