Skip to content
@Cloud-SPAN

Cloud-SPAN

Cloud-based High Performance Computing for SPecialised ANalyses on environmental 'omics

Welcome to Cloud-SPAN on GitHub! 👋

Cloud-SPAN develops training in 'omics with Cloud-based High Performance Computing. The training is aimed at both bioscience researchers and the Research Computing teams that support them.

The project has been running since September 2021 and is collaboration between the Department of Biology at the University of York, UK, and the Software Sustainability Institute, and funded by the UKRI innovation scholars award (MR/V038680/1) the Natural Environment Research Council (NE/X006999/1 and (NE/Y003527/1)).

Our modules

Module Description
Prenomics Prenomics is a 4 - 6 hour module that teaches the basics of command-line programming, including: (1) file directory structure, (2) use of command-line utilities to connect to and use cloud computing and storage resources and (3) basic shell commands for file navigation and basic script writing. It is designed to prepare people for Genomics but, depending on previous experience, you may not need it. There is short (~5 minutes) Self-assessment Quiz to help you decide if you would benefit from attending Prenomics before the Genomics.
Genomics Genomics is a 8 - 12 hour module that teaches data management and analysis for genomics research including: (1) best practices for organization of bioinformatics projects and data, (2) use of command-line utilities to connect to and use cloud computing and storage resources, (3) use of command-line tools for data preparation, (4) use of command-line tools to analyze sequence quality and perform and automate variant calling.
Create Your Own AWS Instance A 2 hour self-study module to teach you how to create and manage your own Cloud-SPAN Amazon Web Services (AWS) instance just like the one used in the Cloud-SPAN courses Prenomics and Genomics. If you attend tutor-led editions of Cloud-SPAN’s Prenomics and Genomics courses you do not need to create your own instance. We will do that for you! But if would like to practice afterwards, or study the courses in your own time, you will need to create an instance first.
Statistically useful experimental design 2 - 3 hour workshop about designing ‘omics experiments including platform choice, replication and controls, sequence coverage and depth and multiple testing corrections. The module does not require any software or coding. Some principles of design will be presented followed by discussion of their application using case studies including the participants's own designs. We assume no experience with designing omics’ experiments but some previous experience of experimental design and statistical analysis - such as would be covered in an undergraduate bioscience degreee - would be useful.
Metagenomics This course teaches data analysis for metagenomics projects. It covers how to (1) generate and QC a metagenome assembly, (2) ‘bin’ the assembly into metagenome assembled genomes (MAGs) also known as bins, (3) identify the taxonomy of these MAGs and, (4) calculate diversity metrics and add functional annotation to identify the products of genes identified in the assembled MAGs. There is 4 - 8 hrs teaching material but the course is delivered over 3 or 4 weeks since many of the analyses take several hours to run. Each week there is a taught session covering the background to the week's material before you work though the lesson at your own pace, followed by a drop-in session to help with any problems
Metagenomics for Environmental Scientists This course is aimed at environmental scientists with little or no experience of using high performance computing (HPC) for data analysis. It is taught over two-weeks online using a mixture of live coding, online lectures, offline time for long analyses to complete and drop-in sessions. It covers (1) Using the command line to log into cloud resources, navigate filesystems and carry out filesystem housekeeping, (2) what metagenomics is, the difference between genomics and metagenomics and the different types of sequencing platforms, and (3) metagenomics analysis including quality control, assembly, polishing, binning and taxonomic assignment
Automated Management of AWS Instancess This course teaches how to automatically manage multiple Amazon Web Services (AWS) instances such as might be used for delivering training courses. It uses Bash Shell scripts to create, configure, stop, start and delete one or more instances with a single invocation of a script.
Core R This online two-hour workshop is an introduction to R for complete beginners. It teaches you how to find your way round RStudio, use the basic data types and structures in R and how to organise your work with scripts and projects. It also teaches you how to import data, summarise it and create and format a graph. The workshop assumes no prior experience of coding.

Our aim is to make our materials FAIR - Findable, Accessible, Interoperable and Reusable. FAIR-for-training-materials png Illustration from Luc Wiegers and Celia van Gelder: https://doi.org/10.5281/zenodo.3593257. https://doi.org/10.1371/journal.pcbi.1007854.g001

Our Handbook

The Cloud-SPAN team are dedicated to providing a welcoming and supportive environment for all people, regardless of background or identity. We hope aim to develop a community of practice around our materials. We have a Handbook that gives:

⭐ An introduction to the project
🤝 Our Code of Conduct
🎓 More information on our Courses
👪 An open invitation to the Cloud-SPAN Community
📌 Information about the FAIR Principles
📜Cloud-SPAN Online Forum

Pinned Loading

  1. CloudSPAN-handbook CloudSPAN-handbook Public

    CloudSPAN Handbook

    HTML 3 1

  2. prenomics00-intro prenomics00-intro Public

    Prenomics course overview and introduction

    Python

  3. 00genomics 00genomics Public

    Genomics Course Introduction

    Python

  4. create-aws-instance-0-overview create-aws-instance-0-overview Public

    Overview of course "Create your AWS instance".

    Python

Repositories

Showing 10 of 62 repositories
  • nerc-metagenomics-v2q Public

    NERC Metagenomics (Quarto-based Version 2 + Bash automation)

    Cloud-SPAN/nerc-metagenomics-v2q’s past year of commit activity
    CSS 0 0 0 0 Updated Dec 2, 2024
  • metagenomics00-overview Public

    Metagenomics course overview

    Cloud-SPAN/metagenomics00-overview’s past year of commit activity
    Python 0 0 2 0 Updated Dec 2, 2024
  • aws-instances Public

    Scripts for managing multiple AWS instances and some data examples: gc_run01_data and gc_run02_data

    Cloud-SPAN/aws-instances’s past year of commit activity
    Shell 0 MIT 0 0 0 Updated Nov 11, 2024
  • Cloud-SPAN/cloud-admin-guide-v2q’s past year of commit activity
    CSS 0 0 0 0 Updated Nov 11, 2024
  • Cloud-SPAN/cloud-span-quarto-site’s past year of commit activity
    CSS 0 0 0 0 Updated Sep 25, 2024
  • Cloud-SPAN/metatranscriptomics’s past year of commit activity
    CSS 1 0 0 0 Updated Sep 19, 2024
  • CloudSPAN-handbook Public

    CloudSPAN Handbook

    Cloud-SPAN/CloudSPAN-handbook’s past year of commit activity
    HTML 3 1 0 0 Updated Aug 21, 2024
  • Cloud-SPAN/cloud-span-graphics’s past year of commit activity
    0 0 0 0 Updated Aug 21, 2024
  • .github Public
    Cloud-SPAN/.github’s past year of commit activity
    0 0 0 0 Updated Jul 12, 2024
  • core-r Public
    Cloud-SPAN/core-r’s past year of commit activity
    JavaScript 0 0 0 0 Updated Apr 25, 2024