Skip to content

Simulation annex

Mark Scheel edited this page Jan 22, 2025 · 43 revisions

SXS Simulation annex

Low-level access to the SXS catalog (e.g. for adding new simulations) is done through the SimulationAnnex and CCEAnnex git-annex repositories. If you need permission to SimulationAnnex, contact any of the gitolite-admins (Mark, Larry, Harald, etc.).

⚠️ There have been several versions of the SimulationAnnex repo. The repo formerly known as SimulationAnnex before May 2019 is now called SimulationAnnexPreMay2019 and is readonly. The repo formerly known as SimAnnex that was used between May 2019 and June 2023 is now called SimAnnexPreJune2023 and is readonly. The new SimulationAnnex repo was created in early 2023 and made live in June 2023.

git-annex

SimulationAnnex and CCEAnnex use git-annex, which allows you to deal with large files in git without having to keep full copies of the large files in every clone of the repo and in every branch. In git-annex, small files are treated as in plain git, as is metadata for large files. Special 'git annex' commands are used to retrieve, modify, and push large files.

⚠️ Warning! Do not use git-annex version 7 if it is less than 7.20191024 !

Version 7.x of git-annex broke our workflow (it changed the behavior of git add). git add was later reverted to its original behavior in git-annex 7.20191024. So please type git annex version and if the version is 7.x and younger than 7.20191024, do not use it; instead upgrade to at least 7.20191024 or downgrade to version 6.x or 5.x

⚠️ Warning! Some versions of git-annex 10 do not work!

With some versions of git-annex you get the error:

get Private/CSUFBBH_3/1023/Lev3/rhOverM_Asymptotic_GeometricUnits_CoM.h5 (from meistri...) 
git-annex-shell: Action blocked by GIT_ANNEX_SHELL_LIMITED

  Transfer failed

  Unable to access these remotes: meistri

  No other repository is known to contain the file.

  (Note that these git remotes have annex-ignore set: origin)
failed
get: 1 failed

A working version of git-annex is available on Mbot:

/home/fs01/spec1163/software/git-annex-standalone-8.20200309-amd64.tar.gz

Installation

Linux

To install git-annex, usually apt, yum, or the equivalent will work (but check the version). You can get an up-to-date version on linux machines using

wget https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-amd64.tar.gz
tar xf git-annex-standalone-amd64.tar.gz
MacOs
brew install git-annex
HPC

For our compute clusters, someone has usually installed git-annex, so use that copy. BFI will automatically find and load git-annex on most of our machines. If you need to do it by hand, here are commands to load git-annex for various machines:

  • wheeler: module load git-annex/6.20170214
  • caltechhpc: module use /central/home/mascheel/modulefiles && module load git-annex
  • stampede2: source /home1/00207/ux450022/load_git_annex.src
  • frontera: source /home1/00207/ux450022/load_git_annex.src
  • bridges2: source /jet/home/mscheel/load_git_annex.src
  • expanse: source /home/ux450022/load_git_annex.src
  • anvil: source /home/x-mscheel/load_git_annex.src
  • carnie: source /home/vvarma/.load_git_annex.src
  • mbot: module load git-annex/8.20210904
  • urania: source /u/vvarma/.load_git_annex.src
  • unity: source /work/pi_vvarma_umassd_edu/.load_git_annex.src (NOTE: You need to add Vijay as PI on Unity.)

Make sure you have the following in your .ssh/config if you get errors about hash algorithm mismatches:

Host sxs-archive.tapir.caltech.edu
   HostKeyAlgorithms +ssh-rsa
   PubkeyAcceptedKeyTypes +ssh-rsa

Set up a local copy of SimulationAnnex

git clone git@sxs-archive.tapir.caltech.edu:SimulationAnnex
cd SimulationAnnex/
make init     # Does `git annex init` and all other initialization things.

⚠️ Did git clone ask for a password? Check your ssh. You might have forgotten to forward your SSH keys to the supercomputer (the -A flag to ssh). Try ssh git@sxs-archive.tapir.caltech.edu: If it asks for a password, your ssh is still wrong. If it gives you a list of repos you have access to but SimulationAnnex doesn't have the correct permissions, contact Mark, Larry, or Harald.

To easily access data by SXSId do:

make links

This will create two directories, PrivateLinks and PublicLinks, with symbolic links to different SXSIds.

Set up a local copy of CCEAnnex

The procedure is identical to that of setting up a local copy of SimulationAnnex except replace SimulationAnnex with CCEAnnex everywhere

Copy data from SimulationAnnex (or CCEAnnex) to your local machine

git-annex works differently than plain git: the large files are copied only if you request them individually using git annex get (this is a good thing! You don't want git clone trying to copy 50TB to your laptop!)

To copy the data onto your local machine, from within the local git repository you set up above, do the following:

git pull --rebase
git annex merge        # updates the git-annex branch, so your repo knows where all the data is
git annex get <path>   # this command is recursive if <path> is a directory, so only run on files/dirs you need or you will fill your disk.

When you are done with the local files, do the following to remove the local copy:

git annex merge
git annex drop <path>

Add new data to SimulationAnnex and CCEAnnex

Please use BFI to do this.
But if you really need to do something manually, you really know what you are doing, see SimulationAnnex_old_wiki.

Checking size of annex

To check locally use:

git annex info --fast --in here ./

To check a remote, like Meistri use:

git annex info --fast --in meistri ./

Note: The path can be anything inside the annex.

Cornell Mirrors

There are mirrors of both the SimulationAnnex (not yet completed as of Oct. 31, 2022) and CCEAnnex at Cornell. These are behind a VPN and can only be accessed on the Cornell astronomy/VPN network. They are designed to serve more as a secure backup than a second access point to the data. Thus, the instructions here will detail the setup and maintenance of the machines rather than retrieving data from them.

The SimulationAnnex is at: sxs-annex.astro.cornell.edu with location /volume1/sxs-annex/SimAnnex (it contains what is now known as SimAnnexPreJune2023) The CCEAnnex is at: sxs-annex8.astro.cornell.edu with location /volume1/sxs-annex/CCEAnnex

You can access their web interface on port 5001, e.g. sxs-annex.astro.cornell.edu:5001. Again, you need to be on the VPN.

When first setting up the Synology box, you must enable SSH access. SSH keys may be used.

Getting git-annex must be done from the command line.

Note that you need a git-annex compatible with the copy at Caltech (see near the top of this wiki).

  1. cd ~
  2. Follow the installation instructions above to get a version of git-annex that works with the Caltech annexes.
  3. Edit ~/.bashrc to have:
export GIT_ANNEX_LD_LIBRARY_PATH=$HOME/git-annex.linux/lib/x86_64-linux-gnu/
export GIT_ANNEX_DIR=$HOME/git-annex.linux
export PATH="$PATH:$HOME/git-annex.linux/"
alias git="LC_ALL=C git"
  1. log out and log back in
  2. Set up SSH keys, e.g. ssh-keygen -t rsa -b 4096 -C "nd357@cornell.edu" No need to have a password protected SSH key here, that can actually cause problems since we will have a cron job pulling the latest Annex periodically.
  3. Make sure the SSH key has access to the SimulationAnnex and CCEAnnex
  4. In the web interface of the Synology boxes, create a shared drive at /volume1 named sxs-annex.
  5. In an SSH session cd /volume1/sxs-annex.
  6. Clone the annex you want, and follow the usual annex setup instructions

Periodic Backups

The Annex at Cornell will automatically initiate a copy of any new data once a week, Sunday at 00:00 Eastern. This is set up in the Synology interface: Control Panel->Task Scheduler under the task named Pull CCEAnnex (there is a Pull SimulationAnnex for the sim annex). The body of that task is:

. $HOME/.bashrc
cd /volume1/sxs-annex/CCEAnnex
git pull --rebase
git annex merge
git annex get ./

Under the General settings for the task the User is sxsadmin2 since this is the owner of the annex. The Schedule settings are Run on the following days: Sunday with First run time: 00:00, Frequency: Every day, and Last run time: 00:00. If the task does not complete successfully it will email Nils Deppe, Larry Kidder, Mark Scheel, and Saul Teukolsky.

You can view the logs by logging into the Synology/annex, then in Control Panel->Task Scheduler select the Pull CCEAnnex task, click the Action drop down, and select View Result.

Note: In Control Panel->Task Scheduler->Settings the logs are being written to /volume1/sxs-annex/CronJobLogs in case you need to find them manually.

Clone this wiki locally