Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Quick VM Tour

Malachi Griffith edited this page Apr 9, 2014 · 31 revisions

Introduction

The simplest way to get a quick sense of what the GMS is all about might be to try loading a virtual machine where the GMS has already been installed and configured. When the GMS virtual machine loads you will be logged in as the user genome (with a password that is also genome). All installation and configuration steps will be complete and demonstration data will be in place. The system can be immediately tested by running genotype-microarray, reference-alignment, somatic-variation, rna-seq, differential-expression, and clin-seq pipelines on this demonstration data.

The virtual machine is a self contained sandbox. The idea is for you to take a short tour of the GMS, execute some simple GMS commands, view some features in the GMS web-viewer, etc. When you are done, you can remove the Virtual Machine and your system will be completely unaffected by the test.

Please note that the pre-configured GMS is meant for simple demonstration purposes only. If you wish to use the system in earnest for large-scale analysis you will want to identify appropriate hardware and adopt one of the installation methods described in the Install Manual.

Finally, to keep this tutorial simple, many details are left out. These details can be found throughout the GMS manuscript and elsewhere in the GMS wiki. For example, the Installation Guide, the Location and Description of the HCC1395 Data, the FAQ page, the Guide to Importing your own Data, the Reference Manual for useful Genome Commands, the Beginners Guide to Demonstration analysis, and much more...

Table of contents

Steps

Step 1. Install VirtualBox

The virtual machine was created with VirtualBox version 4.3.8. VirtualBox is open-source and freely available for the Mac, Linux, and Windows platforms. You should install VirtualBox version >=4.3.8. Download and install VirtualBox for your system here:
https://www.virtualbox.org/wiki/Downloads

Step 2. Download a Pre-configured GMS VirtualMachine Image

The pre-configured virtual machine image contains the GMS installation, a fully functional Ubuntu 12.04 Precise operating system, annotation databases, reference genome sequences, example data and much more. The pre-configured virtual machines are available here:
https://xfer.genome.wustl.edu/gxfer1/project/gms/vms/

The image files are large (~48 Gb) and will take some time to download. You should therefore use a download agent that will allow the download to resume if it is interrupted. For example, at a terminal you could use wget.

wget https://xfer.genome.wustl.edu/gxfer1/project/gms/vms/GMS_VM_V1.zip

Step 3. Unpack the Image

Use your favorite decompression software to unpack the virtual machine. For example, in a Mac or Linux terminal you could use unzip GMS_VM_V1.zip. On Mac or Windows you can probably simply double-click the archive file.

Step 4. Import the Image

Open VirtualBox and add the GMS virtual machine by selecting the GMS .vbox file as follows.

Within VirtualBox, use the Machine -> Add option:

Adding a Virtual Machine in VirtualBox

Find the GMS .vbox file and open it:

Open the GMS .vbox file

Step 5. Configure system resources to be used for the virtual machine

Depending on the resources available on your system you may want to adjust resource usage. For example, you might adjust the base memory, video memory, CPUs, and network connection type. To adjust each of these and more, select the machine GMS_VM_V1 and press the Settings button at the top left of the VirtualBox interface.

General settings:

General settings

Number of processors:

Number of processors

Base memory:

Base memory

Video memory:

Video memory

Network (set to NAT by default by Bridged Adaptor may be faster):

Network

Step 6. Start the GMS system

Select the machine GMS_VM_V1 and press the Start -> button at the top left of the VirtualBox interface. The machine will boot and you will be automatically logged in as the user genome. If you are ever prompted for a password, remember that both the username and password for the system are genome. When the machine boots, you may prompted with some messages about keyboard and mouse settings. You can safely dismiss these.

Logging into the GMS

Logging into the GMS

Step 7. Open the GMS web-viewer and explore demonstration models, processing-profiles, instrument-data, etc.

Open the FireFox browser by clicking the orange and blue icon on the left.

GMS Web-viewer

Step 8. Perform some initial sanity checks of the system

Open a Terminal window by clicking the black icon on the left. Then execute the following commands to test various basic components of the system:

lsid                      # You should see the openlava cluster identification
lsload                    # You should see a report of available resources
bjobs                     # You should not have any unfinished jobs yet
bsub 'sleep 60'           # You should be able to submit a job to openlava (run bjobs again to see it)
bhosts                    # You should see one host
bqueues                   # You should see four queues
genome disk group list    # You should see four disk groups
genome disk volume list   # You should see at least one volume for your local drive
genome sys gateway list   # You should see two gateways, one for your new home system and one for the test data "GMS1"

Step 9. Perform some basic queries of the database

# list the metadata that is already present in the database:
genome taxon list
genome individual list
genome sample list
genome library list
genome instrument-data list solexa

# list the pre-defined models (no results yet ... you will launch these and generate results):
genome model list

# view the processing profiles (pipeline descriptions) associated with those models:
genome processing-profile view --processing-profile='Default Reference Alignment'
genome processing-profile view --processing-profile='Default Somatic Variation Exome'
genome processing-profile view --processing-profile='Default Somatic Variation WGS'
genome processing-profile view --processing-profile='Default Ovation V2 RNA-seq'
genome processing-profile view --processing-profile='cuffcompare/cuffdiff 2.0.2 protein_coding only'

Step 10. Start some test builds and monitor their progress

Open a Terminal window by clicking the black icon on the left. Then execute the following command to view models that have already been defined in the system for demonstration purposes:

genome model list

Start a the genotype-microarray builds for tumor and normal as follows:

genome model build start 'hcc1395-normal-snparray'
genome model build start 'hcc1395-tumor-snparray'

Starting a genotype microarray build for the normal DNA sample

You can monitor progress of ongoing analysis runs in several ways. For example, you can load the GMS web-viewer and go to the builds tab. Or you can view the status of all builds in a Terminal using the command genome model build list. Or you can view a much more detailed status of a running build using the following command for the build of interest (replacing '$build_id' with your own build ID):

genome model build view '$build_id'

View a running build

You can find the genotype-microarray results files as follows:

genome model build list --filter model.name='hcc1395-normal-snparray' --show data_directory

Once the genotype-microarray builds are done launch the reference-alignment builds for the exome data as follows (you may want to do one at a time if you are running on a small machine like a laptop):

genome model build start 'hcc1395-normal-refalign-exome'

Starting a reference-alignment build

Once again you can view the progress of this build as follows:

genome model build view '$build_id'

View a progress of a reference-alignment build

As above you can find the results files for the reference-alignment pipeline including BAM files and germline variants in VCF format as follows:

genome model build list --filter model.name='hcc1395-normal-refalign-exome' --show model.name,data_directory
genome model build list --filter model.name='hcc1395-tumor-refalign-exome' --show model.name,data_directory

To get the final, merged, sorted, duplicate-marked BAM from the tumor exome alignment, you can use the following method:

genome model list --filter name='hcc1395-tumor-refalign-exome' --show id,name,last_complete_build.merged_alignment_result.bam_path

For many more examples, refer to the Reference Manual for useful Genome Commands.


Home Install Tutorials FAQ
Clone this wiki locally