Skip to content

Computing Resources

Mark Scheel edited this page May 8, 2025 · 45 revisions

Computing Resources

SXS has time on a number of computing clusters and supercomputers that you can use.

For many machines, our group purchases a certain amount of CPU time (caltech HPC) or our group writes an application for CPU time (most of the other machines), so we have only a finite amount of time on those machines. Therefore please be mindful of your usage.

Before you access any of our computing resources, it is a good idea to check that you have SSH keys set up, as many of the machines we have access to use them. If you have never set up SSH keys before, or need a refresher on how they work, follow the instructions on Configuring SSH keys.

mbot (Cornell cluster)

In order to get an account, email Nils Deppe your public SSH key, and, if you have a Cornell NetID, your NetID.

Once you have an account, you can ssh into mbot.cac.cornell.edu from the terminal using

ssh your_username@mbot.cac.cornell.edu

substituting in your username (note that you can also add mbot to your .ssh/config file to make the login process easier).

There are detailed instructions about mbot on this wiki.

Wheeler (permanently offline as of 5/1/24)

In order to get an account, email Mark Scheel your public SSH key and a preferred username.

Once you have an account, you can ssh into wheeler.caltech.edu from the terminal using

ssh -X -Y your_username@wheeler.caltech.edu

substituting in your username (note that you can also add wheeler to your .ssh/config file to make the login process easier).

There are detailed instructions about wheeler on this wiki.

Etiquette for working on wheeler:

  • Wheeler consists of many nodes (think of a node as a separate computer; nodes are linked together). One node (the one you login to) is the head node and should be used only for compiling, launching jobs, and running small non-intense tests. We all share the head node so one person can slow it down for everyone.
  • Most nodes are compute nodes: this is where most computation is done. When you submit a job or start an interactive job (see here) you get one or more compute nodes all to yourself for the amount of time you specify in your job submission.
  • When you submit a job, it goes into a queue and is executed when there are enough free nodes. Please do not submit a large number of jobs that fills (or almost fills) the queue all by yourself; doing so increases the queue waiting time for other users.
  • There are a few debug nodes: compute nodes with a 2-hour time limit. These should be used for small parallel debugging/test jobs. The idea of the small time limit is so that the debug nodes will become available rather soon even if they are currently in use.
  • Each user has access to two filesystems: /home/<username> (where you are when you log in), and /panfs/ds09/sxs/<username>. Your /home directory is for compiling and other tasks without large data sets or massively parallel writes. Please run parallel jobs in your /panfs directory, which is large and is optimized for being written to by many cores simultaneously.

More complete details about wheeler can be found here.

Caltech Resnick High Performance Computing Center

https://www.hpc.caltech.edu/

Limits

  • We currently have about 6 million CPU-h per year on this machine (for the entire sxs group combined).
  • Every user has a home directory with a 50GB quota; use it for things like compiling. We have a disk quota of 167TB on /central/groups/sxs (for the entire sxs group, not per user). Look at the file /central/groups/imss_admin/group_usage/sxs_usage to see everyone's usage. Please remove files you don't need; if the disk fills up, then nobody can run anything.

Getting a guest account (non-Caltech)

  • Send an e-mail to JoAnn Boyd (joann at caltech.edu) requesting help setting up an account on the Caltech HPC cluster, and asking her what information she needs from you
  • Complete and sign the two forms that she will e-mail you.
    • CALTECH GUEST DATA SHEET - HPC Collaborators - REMOTE/ELECTRONIC ACCESS ONLY
    • CONFIDENTIALITY AND NON-DISCLOSURE AGREEMENT FOR GUESTS
  • You should receive two e-mails within a day or two, one with your Caltech UID, and another with your access.caltech username and link to activate your account.
  • Click on the link to activate your account
  • Follow instructions below

Getting an account (Caltech and non-Caltech)

  • If you do not have an access.caltech username, follow the instructions above first.
  • Send an email to Saul or Mark asking them to add you to our group on the system
  • You will need to install Duo Mobile on your phone for multi-factor authentication
  • You will need to install the Caltech VPN in order to use the cluster, unless you are connecting from campus.
  • For info on how to login go here

Compute time usage:

  • For the entire group: sreport -T gres/gpu,cpu cluster accountutilizationbyuser start=01/01/18T00:00:00 end=now -t hours account=<group-account-name>
  • For you: sreport -T gres/gpu,cpu cluster accountutilizationbyuser start=01/01/18T00:00:00 end=now -t hours user=$USER

Storage usage:

  • On /central : cat /home/_SYS_/group_usage/sxs_usage
  • Or mmlsquota -u <USERNAME> --block-size auto central:home

ACCESS (formerly XSEDE) machines

One subsection per machine.

ACCESS provides an exchange calculator for when moving an allocation between machines.

Frontera

User guide

To get an account send Larry Kidder your TACC User Portal username (not your ACCESS or XSEDE User Portal username)

Allocation

As of Mar 26 2025, we have 622k node-hours remaining of our 700k node-hour allocation that expires Aug 30 2025.

Notes

  • Files in the scratch partition on Frontera are purged 10 days after the last time accessing them. We can use the directory /scratch3/projects/sxs which is not purged.

  • If you don't have a directory /scratch3/projects/sxs/USER (where USER is your username), please make one, with permissions 755. You can use the command mkdir -m755 $USER.

  • If your username is some incomprehensible string (like 'tg875392') , please also add a symlink /scratch3/projects/sxs/COMPREHENSIBLE_NAME that points to your /scratch3/projects/sxs/USER (where COMPREHENSIBLE_NAME is something that others would recognize as you). For example there is a symlink in /scratch3/projects/sxs called 'mscheel' that points to 'ux450022' which is my username. This way we all don't need to remember who tg875387 happens to be. You can use the command ln -s $USER COMPREHENSIBLE_NAME.

Expanse

User guide

This is one of the machines accessed through the ACCESS program.

To get an account let your advisor know your ACCESS User Portal username (which is the same as your XSEDE username before XSEDE was replaced by ACCESS).

Allocation

As of Mar 26 2025, we have 6.55 M CPU-hours remaining out of our 14M CPU-hour allocation which expires Sep 30 2025.

Bridges2

User guide

This is one of the machines accessed through the ACCESS program.

To get an account let your advisor know your ACCESS User Portal username.

Allocation

As of June 15 2022, we have 3.6 M CPU-hours remaining out of our 5.3M CPU-hour allocation which expires Sep 30 2022.

Stampede2

User guide

This is one of the machines accessed through the ACCESS program.

To get an account let your advisor know your ACCESS User Portal username.

Allocation

As of June 15 2022, we have 60k node-hours remaining out of our 65k node-hour allocation which expires Sep 30 2022.

Anvil

User guide

This is one of the machines accessed through the ACCESS program.

To get an account let your advisor know your ACCESS User Portal username.

Allocation

As of Mar 26 2022, we have 5.1M CPU-hours remaining out of our 13M CPU-hour allocation which expires Sep 30 2025.

Clone this wiki locally