-
Notifications
You must be signed in to change notification settings - Fork 2
Simulation annex
Low-level access to the SXS catalog (e.g. for adding new simulations) is done through the SimulationAnnex and CCEAnnex git-annex repositories. If you need permission to SimulationAnnex, contact any of the gitolite-admins (Mark, Larry, Harald, etc.).
SimulationAnnex
before May 2019 is now called SimulationAnnexPreMay2019
and is readonly. The repo formerly known as SimAnnex
that was used between May 2019 and June 2023 is now called SimAnnexPreJune2023
and is readonly. The new SimulationAnnex
repo was created in early 2023 and made live in June 2023.
SimulationAnnex and CCEAnnex use git-annex, which allows you to deal with large files in git without having to keep full copies of the large files in every clone of the repo and in every branch. In git-annex, small files are treated as in plain git, as is metadata for large files. Special 'git annex' commands are used to retrieve, modify, and push large files.
Version 7.x of git-annex broke our workflow (it changed the behavior of git add
). git add
was later reverted to its original behavior in git-annex 7.20191024. So please type git annex version
and if the version is 7.x and younger than 7.20191024, do not use it; instead upgrade to at least 7.20191024 or downgrade to version 6.x or 5.x
With some versions of git-annex you get the error:
get Private/CSUFBBH_3/1023/Lev3/rhOverM_Asymptotic_GeometricUnits_CoM.h5 (from meistri...)
git-annex-shell: Action blocked by GIT_ANNEX_SHELL_LIMITED
Transfer failed
Unable to access these remotes: meistri
No other repository is known to contain the file.
(Note that these git remotes have annex-ignore set: origin)
failed
get: 1 failed
A working version of git-annex is available on Mbot:
/home/fs01/spec1163/software/git-annex-standalone-8.20200309-amd64.tar.gz
To install git-annex, usually apt
, yum
, or the equivalent will work (but check the version).
You can get an up-to-date version on linux machines using
wget https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-amd64.tar.gz
tar xf git-annex-standalone-amd64.tar.gz
brew install git-annex
For our compute clusters, someone has usually installed git-annex, so use that copy. BFI will automatically find and load git-annex on most of our machines. If you need to do it by hand, here are commands to load git-annex for various machines:
- wheeler:
module load git-annex/6.20170214
- caltechhpc:
module use /central/home/mascheel/modulefiles && module load git-annex
- stampede2:
source /home1/00207/ux450022/load_git_annex.src
- frontera:
source /home1/00207/ux450022/load_git_annex.src
- bridges2:
source /jet/home/mscheel/load_git_annex.src
- expanse:
source /home/ux450022/load_git_annex.src
- anvil:
source /home/x-mscheel/load_git_annex.src
- carnie:
source /home/vvarma/.load_git_annex.src
- mbot:
module load git-annex/8.20210904
- urania:
source /u/vvarma/.load_git_annex.src
- unity:
source /work/pi_vvarma_umassd_edu/.load_git_annex.src
(NOTE: You need to add Vijay as PI on Unity.)
Make sure you have the following in your .ssh/config
if you get errors about hash algorithm mismatches:
Host sxs-archive.tapir.caltech.edu
HostKeyAlgorithms +ssh-rsa
PubkeyAcceptedKeyTypes +ssh-rsa
git clone git@sxs-archive.tapir.caltech.edu:SimulationAnnex
cd SimulationAnnex/
make init # Does `git annex init` and all other initialization things.
git clone
ask for a password? Check your ssh. You might have forgotten to forward your SSH keys to the supercomputer (the -A
flag to ssh
). Try ssh git@sxs-archive.tapir.caltech.edu
: If it asks for a password, your ssh is still wrong. If it gives you a list of repos you have access to but SimulationAnnex doesn't have the correct permissions, contact Mark, Larry, or Harald.
To easily access data by SXSId do:
make links
This will create two directories, PrivateLinks and PublicLinks, with symbolic links to different SXSIds.
The procedure is identical to that of setting up a local copy of SimulationAnnex except replace SimulationAnnex
with CCEAnnex
everywhere
git-annex works differently than plain git: the large files are copied only if you request them individually using git annex get
(this is a good thing! You don't want git clone
trying to copy 50TB to your laptop!)
To copy the data onto your local machine, from within the local git repository you set up above, do the following:
git pull --rebase
git annex merge # updates the git-annex branch, so your repo knows where all the data is
git annex get <path> # this command is recursive if <path> is a directory, so only run on files/dirs you need or you will fill your disk.
When you are done with the local files, do the following to remove the local copy:
git annex merge
git annex drop <path>
Please use BFI to do this.
But if you really need to do something manually, you really know what you are doing, see SimulationAnnex_old_wiki.
To check locally use:
git annex info --fast --in here ./
To check a remote, like Meistri use:
git annex info --fast --in meistri ./
Note: The path can be anything inside the annex.
There are mirrors of both the SimulationAnnex (not yet completed as of Oct. 31, 2022) and CCEAnnex at Cornell. These are behind a VPN and can only be accessed on the Cornell astronomy/VPN network. They are designed to serve more as a secure backup than a second access point to the data. Thus, the instructions here will detail the setup and maintenance of the machines rather than retrieving data from them.
The SimulationAnnex is at: sxs-annex.astro.cornell.edu
with location /volume1/sxs-annex/SimAnnex
(it contains what is now known as SimAnnexPreJune2023)
The CCEAnnex is at: sxs-annex8.astro.cornell.edu
with location /volume1/sxs-annex/CCEAnnex
You can access their web interface on port 5001, e.g. sxs-annex.astro.cornell.edu:5001
. Again, you need to be on the VPN.
When first setting up the Synology box, you must enable SSH access. SSH keys may be used.
Getting git-annex must be done from the command line.
Note that you need a git-annex compatible with the copy at Caltech (see near the top of this wiki).
- cd ~
- Follow the installation instructions above to get a version of git-annex that works with the Caltech annexes.
- Edit ~/.bashrc to have:
export GIT_ANNEX_LD_LIBRARY_PATH=$HOME/git-annex.linux/lib/x86_64-linux-gnu/
export GIT_ANNEX_DIR=$HOME/git-annex.linux
export PATH="$PATH:$HOME/git-annex.linux/"
alias git="LC_ALL=C git"
- log out and log back in
- Set up SSH keys, e.g.
ssh-keygen -t rsa -b 4096 -C "nd357@cornell.edu"
No need to have a password protected SSH key here, that can actually cause problems since we will have a cron job pulling the latest Annex periodically. - Make sure the SSH key has access to the SimulationAnnex and CCEAnnex
- In the web interface of the Synology boxes, create a shared drive at
/volume1
namedsxs-annex
. - In an SSH session
cd /volume1/sxs-annex
. - Clone the annex you want, and follow the usual annex setup instructions
The Annex at Cornell will automatically initiate a copy of any new data once a week, Sunday at 00:00 Eastern. This is set up in the Synology interface: Control Panel
->Task Scheduler
under the task named Pull CCEAnnex
(there is a Pull SimulationAnnex
for the sim annex). The body of that task is:
. $HOME/.bashrc
cd /volume1/sxs-annex/CCEAnnex
git pull --rebase
git annex merge
git annex get ./
Under the General
settings for the task the User
is sxsadmin2
since this is the owner of the annex. The Schedule settings are Run on the following days: Sunday
with First run time: 00:00
, Frequency: Every day
, and Last run time: 00:00
. If the task does not complete successfully it will email Nils Deppe, Larry Kidder, Mark Scheel, and Saul Teukolsky.
You can view the logs by logging into the Synology/annex, then in Control Panel
->Task Scheduler
select the Pull CCEAnnex
task, click the Action
drop down, and select View Result
.
Note: In Control Panel
->Task Scheduler
->Settings
the logs are being written to /volume1/sxs-annex/CronJobLogs
in case you need to find them manually.