diff --git a/franklin/software/modules/index.html b/franklin/software/modules/index.html index 47eca056..96587c9f 100644 --- a/franklin/software/modules/index.html +++ b/franklin/software/modules/index.html @@ -3825,7 +3825,7 @@
Fast and accurate defocus estimation from electron micrographs.
Versions: 4.1.14
-Arches: amd, intel
+Arches: intel, amd
Modules: ctffind/4.1.14+intel
, ctffind/4.1.14+amd
CUDA is a parallel computing platform and programming model invented by @@ -3897,7 +3897,7 @@
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Ada, and Go, as well as libraries for these languages.
-Versions: 4.9.4, 7.5.0, 5.5.0
+Versions: 7.5.0, 4.9.4, 5.5.0
Arches: generic
Modules: gcc/5.5.0
, gcc/7.5.0
, gcc/4.9.4
The free and opensource java implementation
-Versions: 11.0.17_8, 16.0.2
+Versions: 16.0.2, 11.0.17_8
Arches: generic
Modules: openjdk/11.0.17_8
, openjdk/16.0.2
A modified version of Relion supporting block-based-reconstruction as diff --git a/pr-preview/pr-36/404.html b/pr-preview/pr-36/404.html deleted file mode 100644 index baacf0cd..00000000 --- a/pr-preview/pr-36/404.html +++ /dev/null @@ -1,1536 +0,0 @@ - - - -
- - - - - - - - - - - - - - -Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-HPCCF uses cobbler for provisioning and managing -internal DNS.
-There is a cobbler server per cluster as well as one for the public HPC VLAN.
-cobbler.hpc
- public HPC VLAN.cobbler.hive
- hive private and management VLANscobbler.farm
- farm cobbler.peloton
- pelotoncobbler.franklin
- franklinhpc1
, hpc2
, and lssc0
do not have associated cobbler servers.
Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-ie: puppet
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-The DDN provides backend storage for proxmox.
-The primary means of administration is via the web interface. -You will need to be on the HPC VLAN.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-DNS is split between internal (what machines on one of the HPCCF VLANs -see) vs. external (what the rest of the campus and world sees).
-HPCCF uses InfoBlox for public-facing DNS.
-Internal DNS is managed by cobbler.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -This section is for HPCCF admins to document our internal infrastructure, processes, and -architectures. -Although the information may be of interest to end users, it is not designed or maintained for -their consumption; nothing written here should be confused as an offering of service. -For example, although we describe our Virtual Machine infrastructure, which we used for -hosting a variety of production-essential services for our clusters, we do not offer VM hosting -for end users.
- - - - - - - - - - - - - - - - - - - - - - -Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-HPCCF's Netbox Site is our source of truth for our -rack layouts, network addressing, and other infrastructure. NetBox is an infrastructure resource modeling (IRM) application designed to empower network automation. NetBox was developed specifically to address the needs of network and infrastructure engineers.
-This section will give an overview of how HPCCF admins utilize and administer Netbox.
-Navigate to HPCCF's Netbox instance here: HPCCF's Netbox Site
Select the site to which you will be adding an asset too. In this example I have chosen Campus DC:
Scroll down to the bottom of this page and select which of the locations you will add your asset too, here I chose the Storage Cabinet:
On this page scroll to the bottom and select Add a Device:
After you have selected Add a Device you should see a page like this:
Fill out this page with specifics of the asset, some fields are not required but try to fill out this section as much as possible with the fields available, here is an example of a created asset and how it should look:
Ensure to click on Save to have the device added.
-On the asset page select the + Add Components dropdown and select the component you wish to add, for this I have chosen a Console Port:
Here again you will fill out the dropdowns as thoroughly as possible, the example here is of an interface that has already been added:
Again make sure to click Save to ensure the component has been added.
-This process can be used to add all of the following componentes to a device:
After a component has been created such as an interface, power port or any other type of component you will want to connect it to something. For any component the process is similar within Netbox. In this example it will show how to connect an Infiniban port on a device to a port on an Infiniban switch. First navigate to the device you wish to work with and select the appropriate tab, in this case it will be Interfaces and you will see a page like this:
Here we will connect ib1 to an infiniban switch by clicking the green dropdown off to the right of ib1 and we will be connecting to another interface on the infiniban switch so we will choose interface as shown here:
Once selected you will come to a screen that looks like this:
Once all filled out with the required information to complete the connection (and any additional information that can be provided) at the bottom make sure to create the connection, your screen should look something like this:
Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-This is meant to be a general configuration guide to Open OnDemand (OOD) for admins. But, I'd also like this to serve as an admin troubleshooting tutorial for OOD. So, the bulk of relevant OOD configs are located in /etc/ood/config/
but the contents within are controlled by puppet. Usually, OOD is served by, or behind, apache and those configs are located in /etc/apache2/
and the default served dir is located at /var/www/ood
but these are also heavily controlled by puppet. For the rest of this config documentation I'll be categorizing by the file names, but I'll also try to refer to the puppet-openondemand class for that file as well.
Apps in OnDemand are located in /var/www/ood/apps/sys/<app name>
. The OOD dashboard itself is considered an app and is located here. The "dev" made apps are cloned here by puppet (openondemand::install_apps:
) from hpccf's github (i.e. https://github.com/ucdavis/hpccf-ood-jupyter). OOD apps are, put simply, sbatch scripts that are generated from ERB templates. Inside the app's directory, what is of most interest to admins is the: form.yml
, submit.yml
, and the template/
directory. I would guess that the majority of troubleshooting is happening here. Note that any of the files within this dir can end in .erb
if the you want its contents dynamically generated. To learn more about apps you can find the docs here: https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive.html
This file represents the form users fill out and the fields for selecting clusters, partitions, cpu, mem, etc. If you wanted to add another field you can do it here. Or, if you suspect there's a bug with the web form I recommend starting here. More about form.yml can be found here: https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive/form.html#
-This file contains the contents of the sbatch job as well as job submission parameters that are submitted to slurm (or whatever scheduler you are using). Also, here you can configure the shell environment in which the app is run. If you suspect a bug might be a slurm, slurm submission, or a user environment issue I'd start here. More about submit.yml can be found here: https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive/submit.html
-This directory is the template for the sbatch job that the interactive app is run in. Any code, assets, etc. necessary with the app itself should be included here. When a user launches an OOD app this directory is processed by ERB templating system then copied to ~/ondemand/data/sys/dashboard/batch_connect/sys/...
. In this directory you may see three files of interest to admins: before.sh
, script.sh
, after.sh
. As their names suggest there's a script that runs before the job, one after, and the job itself. OOD starts by running the main script influenced by submit.yml
and forks thrice to run the before.sh
, script.sh
, and after.sh
. More about template/
can be found here: https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive/template.html
This is just the html view of the app form. I doubt you need to be editing this.
-This is where you set the app's name that shows on the dashboard and the app's category. More about manifest.yml
can be found here: https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive/manifest.html
If you want to edit, add, or create OOD apps you must be enabled as a dev app developer. In puppet this is done by placing your username under openondemand::dev_app_users:
and puppet will then do the following:
mkdir -p /var/www/ood/apps/dev/<username>
-sudo ln -s ~/ondemand/dev /var/www/ood/apps/dev/<username>/gateway
-
git clone
apps to your OOD app developer environment located in ~/ondemand/dev/
. Your dev apps will show in a separate sidebar from the production ood apps and won't be visible by anyone else unless shared.
-/etc/ood/config/clusters.d/
is the config dir where OOD is coupled with a cluster and global scheduler options are specified. For OOD apps to work and be submitted to a cluster this yaml needs to be present and must be named after the cluster's hostname i.e. /etc/ood/config/clusters.d/farm.yml
. This area is controlled by puppet under openondemand::clusters:
. The most relevant section of this file for people not named Teddy is batch_connect:
, and more specifically the script_wrapper:
, is where you can put shell commands that will always run when an OOD app is ran.
batch_connect:
- basic:
- script_wrapper: |
- source /etc/profile
- module purge
- %s
- set_host: host=$(facter fqdn)
- vnc:
- script_wrapper: |
- source /etc/profile
- module purge
- module load conda/websockify turbovnc
- export WEBSOCKIFY_CMD="websockify"
- turbovnc-ood
- %s
- set_host: host=$(facter fqdn)
-
Under batch_connect:
is the script wrappers listed by the parent app category. Apps like JupyterLab and RStudio are in the basic
category, and VNC has its own category. Anything set in the script_wrapper:
under the app category is always run when an app of that category is run. So if you add a module load openmpi
to the script wrapper under basic:
then that will be ran, and openmpi will be loaded, whenever RStudio or JupyterLab is started. The %s
is a placeholder for all the scripts from the aforementioned template/
dir . You can use the placeholder to differentiate whether you want your commands to be run before or after your OOD app is started.
The facter fqdn
within set_host:
key should resolve to the fqdn of the compute/gpu node the job is running on.
More about clusters.d/
can be found here: https://osc.github.io/ood-documentation/latest/installation/add-cluster-config.html
/etc/ood/config/ood_portal.yml
is the top most config for OOD. Here be dragons, don't edit this file unless you know what you are doing! Here you can set the server name and port number that OOD will listen on. As well as, OOD related apache configs, certs, proxies, CAS confs, root uri, node uri, logout uri, etc.
Once a user authenticates with OOD, apache then starts the PUN as the user. /etc/ood/config/nginx_stage.yml
determines all the properties of the PUN including global settings for every user's shell env. If you suspect a bug is a user shell env problem, first check the local app env configs set in: submit.yml
in the app's directory first. More about nginx_stage.yml can be found here: https://osc.github.io/ood-documentation/latest/reference/files/nginx-stage-yml.html
You can make an announcement to be displayed within a banner on OOD by creating a yml or md file in /etc/ood/config/announcements.d/
. When any user navigates to OOD's dashboard, OOD will check here for the existence of any files.
Here's an example announcement yaml:
-type: warning
-msg: |
-On Monday, September 24 from 8:00am to 12:00pm there will be a **Maintenece downtime**, which will prevent SSH login to compute nodes and running OnDemand
-
You can also create an test-announcement.yml.erb
to take advantage of ERB ruby templating. More about OOD announcements can be found here: https://osc.github.io/ood-documentation/latest/customizations.html#announcements
You can have the OOD dashboard display the system MOTD by setting these environment variables:
-MOTD_PATH="/etc/motd" # this supports both file and RSS feed URIs
-MOTD_FORMAT="txt" # markdown, txt, rss, markdown_erb, txt_erb
-
In /etc/ood/config/apps/dashboard/env
/etc/ood/config/ondemand.d/
is home to nearly all other OOD configs not mentioned here (i.e. ticket submission, nav customizations, branding, etc.). The contents are controlled by puppet, under openondemand::confs:
, and the puppet formatting to properly place yamls here is as follows:
openondemand::conf:
-<name of yaml (i.e. tickets; If you want to create a tickets.yml)>
- data: (denotes the content to put in yaml)
- <yaml key>: <yaml value>
-
support_ticket:
- data:
- support_ticket:
- email:
- from: "noreply@%{trusted.domain}"
- to: hpc-help@ucdavis.edu
-
More about ondemand.d
, openondemand::confs
, and their function and format can be found
-here: https://osc.github.io/ood-documentation/latest/reference/files/ondemand-d-ymls.html
-and here: https://forge.puppet.com/modules/osc/openondemand/
sequenceDiagram
- user->>apache: `/var/log/apache2/error.log`
- apache->>CAS: `/var/cache/apache2/mod_auth_cas/`
- CAS->>apache: return
- apache->>pun: `/var/log/apache2/$fqdn_error.log`
- pun->>dashboard: `/var/log/ondemand-nginx/$user/error.log`
- dashboard->>oodapp: `$home/ondemand/data/sys/dashboard/batch_connect/sys/$app/output/$session_id/output.log`
- oodapp->>user: render
-
To start, all users who navigate to the ondemand website first encounter the apache server. Any errors encountered at this step will be in the log(s) at /var/log/apache2/error.log
. This also the part where apache redirects any users who arrive at http://ondemand...:80 to https://ondemand...:443; You can narrow down if this is indeed the problem by just checking if the https url works. If the http site works but the https site doesn't, then it's an ssl proxy error check the apache configs.
Apache then redirects the users to CAS for authentication. You can grep -r $USER /var/cache/apache2/mod_auth_cas/
to check if users have been authed to CAS and a cookie has been set.
CAS brings us back to apache and here apache runs all sorts of OOD Lua hooks. Any errors encountered at this step will be in the l
-og(s) at /var/log/apache2/$fqdn_error.log
Apache then starts an NginX server as the user and most things like the main dashboard, submitting jobs, running apps, etc happen here in the PUN. Any errors encountered at this step will be in the logs at /var/log/ondemand-nginx/$user/error.log
. You can also see what might be happening here by running commands like ps aux | grep $USER
to see the users PUN, or ps aux | grep -i nginx
to see all the PUNs. From the ondemand web UI theres an option to "Restart Web Server" which essentially kills and restarts the users PUN.
The dashboard is mostly covered in section 4 as part of the PUN, but just wanted to denote that apache then redirects us here after the PUN has been started where users can do everything else. At this step OOD will warn you about things like "Home Directory Not Found" and such. If you get this far I'd recommend you troubleshoot issues with users' home dir, NASii, and free space: df | grep $HOME
, du -sh $HOME
, journalctl -u autofs
, mount | grep $HOME
, automount -m
, and all the nfs homedir stuff. You should also check zfs, connection speed, and logs on the NAS where users' homedir is located with some things like zpool status
as root.
When users start an app like JuyterLab or a VNC desktop the job is submitted by the users' PUN and here OOD copies and renders (with ERB) the global app template from /var/www/ood/apps/sys/<app_name>/template/*
to $HOME/ondemand/data/sys/dashboard/batch_connect/sys/<app_name>/(output)/<session_id>
. Any errors encountered at this step will be in $HOME/ondemand/data/sys/dashboard/batch_connect/sys/<app_name>/(output)/<session_id>/*.log
. i.e. output.log or vnc.log
Maybe the ondemand server is just in some invalid state and needs to be reset. I'd recommend you check the puppet conf at /etc/puppetlabs/puppet/puppet.conf
, run puppet agent -t
, and maybe restart the machine. Running puppet will force restart the apache server and regenerate OOD from the ood config yamls. Then you can restart the server by either ssh-ing to the server and running reboot
, or by ssh-ing to proxmox and running qm reset <vmid>
as root. TIP: you can find the vmid by finding the server in qm list
.
ondemand.farm.hpc.ucdavis.edu
¶dood.vm.farm.hpc.ucdavis.edu
¶ondemand.franklin.hpc.ucdavis.edu
¶ondemand.hive.hpc.ucdavis.edu
¶Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-(cobbler, etc)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-software.hpc.ucdavis.edu
as software-user
.Log in to the server using SSH:
- -You must make your changes as software-user
, not your own user.
Initiate a transaction to enable read-write access:
- ---Note: This command mounts
-/cvmfs/hpc.ucdavis.edu
as read-write; prior to this, it will be read-only.
Load the Spack module to set up the environment:
- -This command moves you to the /cvmfs/hpc.ucdavis.edu/sw/spack
directory, loads the Spack software, and activates the main
environment.
The YAML files for the main
environment live under environments/main
:
Do not edit spack.yaml
directly. Instead:
- - Add new libraries to libs.yaml
.
- - Add new software to general.yaml
.
The format is software @version +variant
. For example:
Please include the version of the software. Available versions can be found at packages.spack.io. You can also use spack info
:
$ spack info gromacs
-CMakePackage: gromacs
-
-Description:
- GROMACS is a molecular dynamics package primarily designed for
- simulations of proteins, lipids and nucleic acids. It was originally
- developed in the Biophysical Chemistry department of University of
- Groningen, and is now maintained by contributors in universities and
- research centers across the world. GROMACS is one of the fastest and
- most popular software packages available and can run on CPUs as well as
- GPUs. It is free, open source released under the GNU Lesser General
- Public License. Before the version 4.6, GROMACS was released under the
- GNU General Public License.
-
-Homepage: https://www.gromacs.org
-
-Preferred version:
- 2024.1 https://ftp.gromacs.org/gromacs/gromacs-2024.1.tar.gz
-
-Safe versions:
- main [git] https://gitlab.com/gromacs/gromacs.git on branch main
- 2024.1 https://ftp.gromacs.org/gromacs/gromacs-2024.1.tar.gz
- 2024 https://ftp.gromacs.org/gromacs/gromacs-2024.tar.gz
- 2023.4 https://ftp.gromacs.org/gromacs/gromacs-2023.4.tar.gz
- 2023.3 https://ftp.gromacs.org/gromacs/gromacs-2023.3.tar.gz
-...
-Variants:
- build_system [cmake] cmake
- Build systems supported by the package
- build_type [Release] Debug, MinSizeRel, Profile, Reference, RelWithAssert, RelWithDebInfo,
- Release
- The build type to build
- cp2k [false] false, true
- CP2K QM/MM interface integration
- cuda [false] false, true
- Build with CUDA
- cycle_subcounters [false] false, true
-...
-
This lists the available versions and variants for the package. It's important to take a look at these variants, as there may be non-default variants that are needed depending on the user request.
-After making changes, reconcretize the environment:
- -This command will concretize only the new specifications you've added. If the concretization fails, you should probably escalate.
-If concretization is successful, proceed with the installation:
- -Specify the new software or pieces of software you have concretized in order to avoid needing to wait for other failed builds to install. You do not need to specify their versions, as Spack will install the newly concretized versions automatically. -If the build fails, you may need to fork the package and make fixes, which is out of scope for this guide. If it succeeds, proceed.
-Ensure that the modulefiles are generated. This generally happens on its own, but on occasion needs to be kicked off manually:
- -You should do a cursory check that the software actually works. Usually the best way to do this is to just run name-of-software -h
. For example, for the software fastani
:
$ module load fastani/1.33
-$ fastANI -h
------------------
-fastANI is a fast alignment-free implementation for computing whole-genome Average Nucleotide Identity (ANI) between genomes
------------------
-...
-
If it runs without errors, proceed to the next step. If not, it's probably time to escalate.
-Commit your changes to the spack-ucdavis repo. Move into it:
- -Add your changes to the YAML file:
-git add environments/hpccf/software.hpc/main/general.yaml
-git add environments/hpccf/software.hpc/main/libs.yaml
-
Commit your changes with the following format:
- -Then commit the new concretized lockfile:
- -Perform necessary maintenance before deployment:
- -Move out of the cvmfs directory and close out any open files you have there.
-Publish your changes to cvmfs:
- -Warning
-- This documentation is for internal use. - It may be of interest to users who are curious about our internal processes and architecture, but should not be mistaken for describing services that we offer or stable infrastructure that end users should rely upon. - - If you find yourself submitting a ticket about something on this page, you are probably making a mistake. - -
-HPCCF uses Proxmox for virtualization. Current
-servers are proxmox1
, proxmox2
, and proxmox3
.
To log in, point your browser to port 8006
on any of the proxmox servers, and choose
-UCD-CAS
as the realm. You'll need to be on the HPC VLAN to access the interface.
Use Netbox to locate a free IP address, or allocate one in the appropriate -cobbler server. See provisioning for more information -on selecting an IP/hostname and setting up PXE.
-Choose an unused VM ID. Storage areas are pre-created on the DDN, on directory per -VM ID. If more need to be created, see the DDN documentation. Populate -the "Name" field with your chosen VM name.
-If you're installing a machine via PXE from a cobbler server, choose "Do not use -any media."
-To add a new ISO, copy it to /mnt/pve/DDN-ISOs/template/iso/
on one of the
-proxmox hosts.
Check the Qemu Agent
box.
Defaults are fine. Adjust disk size as needed.
-Use type x86-64-v3
. Adjust cores to taste.
Recent Ubuntu installer will fail unless you use at least 4096.
-See Netbox for a list of vlans.
-Make sure to select VirtIO (paravirtualized)
for the network type.
Do not forget to add to DNS.
-If this is a production VM, add the "production" tag.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -