Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Sharing GMS results using an Amazon AWS Instance

Malachi Griffith edited this page Aug 24, 2014 · 19 revisions

Introduction

Suppose you want to share the results of an entire whole genome, exome and/or transcriptome analysis. This might be released publicly along with a manuscript to allow readers to scrutinize the entire analysis process and explore the results in more detail than is typically possible in a printed article. Or you might want to release the results to specific collaborators prior to publication or share the results for some other purpose.

The simplest way to release a complete GMS result is to perform the analysis on a cloud computing resource such as Amazon AWS EC2 and then make that instance available.

The following tutorial describes how to share a complete GMS result including the raw input data, the processing profiles you used, the GMS software code used to run the pipelines, all bioinformatic tools used, the reference genome sequences, gene annotations, builds containing the final results, etc.

This assumes that you have already completed the installation and analysis as described in the Beginner's guide to installation on Amazon AWS and completed the analysis you wish to share as described in the Beginner's guide to the demonstration analysis. As you will see below, in addition to sharing the results on an Amazon AWS instance, you can also share the login credentials which means that you could allow you collaborator to login an perform these analyses as well.

Allowing access to the GMS Web Viewer

When you install the GMS, a custom Genome Web Viewer is automatically installed and configured to allow you to browse entities of the GMS. These include models, processing profiles, instrument data, subjects, and builds. This service is automatically running on your instance and providing access to it is as simple as configuring the security of your instance and providing a URL.

In order for outside users to log into the instance you will have to explicitly allow this in the 'Security Group' settings for the instance you wish to share. To do this, follow these steps.

First log into the AWS EC2 console and view your running instances: <screen shot 1>

Next, find the instance you wish to share and determine what Security Group is being applied to it. You will need to edit this security group to make sure incoming web access is permitted. You can get to these settings by either clicking the name of the security group in the description section of the instance you wish to share or you can note the name and follow the 'Security Groups' link under 'Network and Security' in list of options in the left sidebar of the console. If you do that you will see something like this: <screen shot 2>

In this example I have already created a security group called 'SGMS_HTTP-and-SSH'. Make sure your security group is selected by clicking on that row in the console. Next, to allow incoming web access you will need to select the 'Inbound' tab, then 'Edit', then 'Add Rule' and select 'HTTP' from the drop down menu. <screen shot 3>

Note that when you modify the security group of a running instance you will have to reboot that instance for it to take effect. You can reboot the instance from within a terminal session by typing sudo reboot or from the EC2 console by going back to the 'Instances' page, right clicking the row for your instance and selecting 'Reboot' under 'Actions'. <screen shot 4>

Once the instance reboots. Find the 'Public DNS' or 'Public IP' for your instance. These can be found in the description section for the instance after selecting 'Instances' and clicking on the row for the instance you are sharing. <screen shot 5>

Enter either the 'Public DNS' or 'Public IP' in a browser window. You will land on the home page for the GMS Web Viewer running on your GMS instance. You can share this link with your collaborators and they should be able to see everything you see. For example: http://ec2-54-201-216-53.us-west-2.compute.amazonaws.com/ <screen shot 6>

In the top right hand corner of this view is the unique GMS system ID created when you installed the GMS. To view a particular processing profile, you can select that tab and select it. For example, the default exome somatic-variation processing profile. <screen shot 7>

To view results, in this case for the example HCC1395 analysis follow these steps. Go back to the GMS home page by clicking on you GMS ID or 'Genome Modeling System' at the top of the page. Then select the 'Builds' tab. Then enter 'hcc1395' in the 'Filter results' box. You may also want to increase the number of records shown per page. You can also click on any column header to sort the table by those values. For example, in the following screenshot I have sorted by 'Model'. <screen shot 8>

To see some results, try clicking on the build ID for the model 'hcc1395-clinseq'. This page will show you a detailed summary of this build, its inputs, workflow stages, etc. To browse results follow the 'data' link and enter the 'TST1' directory.
<screen shot 9>

All of the clin-seq (aka 'med-seq') results are available here. Note that in this example we used the downsampled data instead of the complete data set. So the results will be conceptually identical to those described in the GMS manuscript but will not match exactly. We could have easily shared results from the complete analysis but it would cost more to maintain the Amazon instance persistently.

Individual results can now be browsed and shared by URL. For example the circos plot for HCC1395 is here: http://ec2-54-201-216-53.us-west-2.compute.amazonaws.com/opt/gms/5GC0L02/fs/5GC0L02/info/model_data/18177dd5eca44514a47f367d9804e17a/build23c23b49c80048f4a386bb0285279995/TST1/circos/circos.png

<screen shot 10>

A complete report of annotated SNVs and Indels with supporting read counts can be found here: http://ec2-54-201-216-53.us-west-2.compute.amazonaws.com/opt/gms/5GC0L02/fs/5GC0L02/info/model_data/18177dd5eca44514a47f367d9804e17a/build23c23b49c80048f4a386bb0285279995/TST1/snv_indel_report/TST1_final_filtered_coding_clean.xls

A detailed description of result files in this build and the other build types can be found here: Location and description of results files in GMS pipelines

Allowing SSH access to the GMS instance so that users can log in with a terminal session

Setting up sharing during the creation of a new GMS instance

Note that all of the above security settings can be configured when you first create the EC2 instance. You can create a 'Key Pair' with sharing in mind and name it in a descriptive way for the project. You can also create a custom 'Security Group' and configure it for the sharing strategy you would like.

Notes on security and privacy

Clone this wiki locally