Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@

# General information about the project.
project = u'khmer-protocols'
copyright = u'2013, C. Titus Brown et al.'
copyright = u'2015, C. Titus Brown et al.'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
Expand All @@ -54,7 +54,7 @@
# The short X.Y version.
version = '0.8'
# The full version, including alpha/beta/rc tags.
release = '0.8.4'
release = '0.8.5'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
10 changes: 9 additions & 1 deletion index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,20 @@ This is a protocol for assembling low- and medium-diversity metagenomes.
Marine sediment and soil data sets may not be assemblable in the cloud
just yet.

Reference based RNAseq assembly: the refTrans Protocol
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:doc:`refTrans/index`

This is a protocol for reference based transcriptome assembly.
Materials are based on two workshops given at MSU by Titus Brown and Matt Scholz on Dec 4/5 ad Dec 10/11.

Additional information
----------------------

Need help? Either post comments on the bottom of each page, OR
`sign up for the mailing list <http://lists.idyll.org/listinfo/protocols>`__.

Have you used these protocols in a scientific publication? We'll have
citation instructions up soon.

Expand Down
201 changes: 201 additions & 0 deletions refTrans/data-snapshot.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
Create a data snapshot on Amazon
================================

.. shell start

Essentially you create a disk; attach it; format it;
and then copy things to and from it.


Getting started: lunch EC2 instance with extra EBS volume
---------------------------------------------------------
We will follow up the main events and highlight the critical points:

1. log into your AWS account

2. In the AWS managmenet consol:
1. click EC2 (Amazon Elastic Compute Cloud) to open the EC2 Dashboard
2. Change your location on the top right corner of the page to be US East (N. Virginia)
3. press “Launch Instance” (midway down the page)
4. Choose an Amazon Machine Image (AMI): Ubuntu Server 14.04 LTS (HVM), SSD Volume Type - ami-9a562df2
5. Choose an Instance Type: for now I am using the free tier t2.micro instance
6. Next: Configure Instance Details: Keep the defaults
7. **Next: Add Storage**: Keep the defualt storage with 8 GiB as it is and add new EBS volume of 1 GiB general purpose SSD type (to create data snapshot on it). Make sure it does not delete after termination
8. Next: Tag instance: give a name to your instance (I called mine refTrans_instance)
9. Next: Configure Security Group: Create a new security group and adjust your security rules to enable ports 22, 80, and 443 (SSH, HTTP, and HTTPS).
10. click "review and lunch" the click "lunch"
11. from the top drop-down menu, choose an existing key pair or create a new one and download to your disk
12. click lunch instance
13. scroll down and click view instance
14. copy the Public DNS in your clipboard

Logging into your new instance "in the cloud" (Windows version)
--------------------------------------------------------------
1. Download PuTTY and PuTTYgen from: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
2. Run PuTTYgen, Find and load your '.pem' file, and Now, "save private key"
3. Run PuTTY, paste the public DNS in the host name & expand SSH, click Auth (do not expand), find your private key and click open
4. click yes if probmbted then Log in as "ubuntu"

Getting the data
----------------

We'll be using a few RNAseq data sets from Fagerberg et al., `Analysis
of the Human Tissue-specific Expression by Genome-wide Integration of
Transcriptomics and Antibody-based Proteomics
<http://www.mcponline.org/content/13/2/397.full>`__.

You can get this data from the European Nucleotide Archive under
`ERP003613 <http://www.ebi.ac.uk/ena/data/view/ERP003613>`.
All samples in this project are `paired end
<http://www.illumina.com/technology/next-generation-sequencing/paired-end-sequencing_assay.html>`__ .
So each sample is represented by 2 files. These files are in
`FASTQ Format <http://en.wikipedia.org/wiki/FASTQ_format>`__ .


In this tutorial we will work with two tissues: `salivary gland
<http://www.ebi.ac.uk/ena/data/view/SAMEA2151887>`__ and `lung
<http://www.ebi.ac.uk/ena/data/view/SAMEA2155770>`__. Note that each
tissue has two replicates, and each replicate has two files for the paired end reads

make a directory to download the data to the Amazon cloud
::

mkdir refTransData
cd refTransData

Download the sequencing files of the salivary gland
::

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR315/ERR315325/*
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR315/ERR315382/*

Download the sequencing files of the lung tissue
::

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR315/ERR315326/*
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR315/ERR315424/*


Now do::

ls -la refTransData/

You'll see something like ::

-r--r--r-- 1 mscholz common-data 3714262571 Dec 4 08:44 ERR315325_1.fastq.gz
-r--r--r-- 1 mscholz common-data 3714262571 Dec 4 08:44 ERR315325_2.fastq.gz
...

which tells you that this file is about 900 MB. You can go a head and use these files
for the reset of the protocl. But for the sake of time (& memory), we will run our demo
on a subset of this dat


Prepare the working data on the EBS volume
------------------------------------------

1. Use the lsblk command to view your available disk devices and their mount points
::

lsblk


You should see something like this ::

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
xvdf 202:16 0 1G 0 disk


| /dev/xvda1 is mounted as the root device (note the MOUNTPOINT is listed as /, the root of the Linux file system hierarchy),
| and /dev/xvdf is attached, but it has not been mounted yet.

2. New volume does not have a file system. Use the sudo file -s device command to double check
::

sudo file -s /dev/xvdf


If the output of the previous command shows simply data for the device, then there is no file system on the device and you need to create one.

3. Create an ext4 file system on the volume
::

sudo mkfs -t ext4 /dev/xvdf

4. Create a mounting point directory for the volume. The mount point is where the volume is located in the file system tree and where you read and write files to after you mount the volume
::

sudo mkdir refTransData/dataSubset

5. Mount the volume at the location of the project folder
::

sudo mount /dev/xvdf refTransData/dataSubset

6. Copy in a subset of the data (100,000 reads)
::

sudo sh -c 'gunzip -c ERR315325_1.fastq.gz | head -400000 | gzip > dataSubset/salivary_repl1_R1.fq.gz'
sudo sh -c 'gunzip -c ERR315325_2.fastq.gz | head -400000 | gzip > dataSubset/salivary_repl1_R2.fq.gz'
sudo sh -c 'gunzip -c ERR315382_1.fastq.gz | head -400000 | gzip > dataSubset/salivary_repl2_R1.fq.gz'
sudo sh -c 'gunzip -c ERR315382_2.fastq.gz | head -400000 | gzip > dataSubset/salivary_repl2_R2.fq.gz'


and do the same for the lung samples
::

sudo sh -c 'gunzip -c ERR315326_1.fastq.gz | head -400000 | gzip > dataSubset/lung_repl1_R1.fq.gz'
sudo sh -c 'gunzip -c ERR315326_2.fastq.gz | head -400000 | gzip > dataSubset/lung_repl1_R2.fq.gz'
sudo sh -c 'gunzip -c ERR315424_1.fastq.gz | head -400000 | gzip > dataSubset/lung_repl2_R1.fq.gz'
sudo sh -c 'gunzip -c ERR315424_2.fastq.gz | head -400000 | gzip > dataSubset/lung_repl2_R2.fq.gz'


Getting Set Up with the AWS Command Line Interface and make the snapshot
------------------------------------------------------------------------
The AWS Command Line Interface is a unified tool to manage your AWS services. It will help us making a snapshot of our data and a lot more
Follow the `Amazon documenation <http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html#cli-signup>`__. Briefly:

1. Get your access key ID and secret access key through `IAM console <https://console.aws.amazon.com/iam/home?#home>`__. Do not forget to add adminstrative access role

2. install AWS Command Line Interface
::

cd ~
sudo apt-get update
sudo apt-get install awscli

3. configure credentials by running
::

aws configure

The AWS CLI will prompt you for four pieces of information::

AWS Access Key ID [None]: You got it from IAM console, it should be something like AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: You got it from IAM console, it should be something like wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: This is the region of your bucket (in our case it is us-east-1)
Default output format [None]: json


4. Create a snapshot (get the volume-id from the EC2 dashboard in the volums tab)
::

sudo umount -d /dev/xvdf
aws ec2 create-snapshot --volume-id vol-90ab2b8b --description "refTrans sample data"

The snapshot has an ID: snap-6a5ddae5

5. Now we can share this snapshot but we need to modify its the permissions
::

aws ec2 modify-snapshot-attribute --snapshot-id snap-6a5ddae5 --attribute createVolumePermission --operation-type add

Now you can stop your instance and even delete the EBS divers.

.. shell stop

----

Back: :doc:`m-dataNsoftware_amazon`
Binary file added refTrans/files/model-rnaseq-pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
70 changes: 70 additions & 0 deletions refTrans/igenome-backup.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
Storage of my iGenome tarball in the Amazon S3
==============================================

Connect to MSU HPC and get a copy of the Bowtie2Index, genome and gtf annotation
--------------------------------------------------------------------------------
After I logged into my AWS instance, I opened sftp connection with MSU HPC
and got the file I already prepared there in this way::

sftp (username)@hpc.msu.edu
get /path/to/Homo_sapiens_UCSC_hg19_small.tar.gz
exit

Create S3 bucket
----------------
In the AWS managmenet consol:
1. Click S3 (Amazon simple storage service)
2. Create Bucket
3. Define the Bucket Name: reftransdata (Choose the Region: US Standard)

Getting Set Up with the AWS Command Line Interface and copy a backup to S3
--------------------------------------------------------------------------
The AWS Command Line Interface is a unified tool to manage your AWS services. It will help us making a snapshot of our data and a lot more
Follow the `Amazon documenation <http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html#cli-signup>`__. Briefly:

1. Get your access key ID and secret access key through `IAM console <https://console.aws.amazon.com/iam/home?#home>`__.
Do not forget to add adminstrative access role

2. install AWS Command Line Interface
::

cd ~
sudo apt-get install awscli

3. configure credentials by running
::

aws configure

| The AWS CLI will prompt you for four pieces of information::

AWS Access Key ID [None]: You got it from IAM console, it should be something like AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: You got it from IAM console, it should be something like wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: This is the region of your bucket (in our case it is us-east-1)
Default output format [None]: json


4. copy the tarball to your S3 space
::

aws s3 cp $workingPath/Homo_sapiens_UCSC_hg19_small.tar.gz s3://reftransdata/Homo_sapiens_UCSC_hg19_small.tar.gz

.. Go to S3 consol and right click the file and choose "make it public"


Get a copy back from your S3 to the current EC2
-----------------------------------------------
Install and configure the AWS Command Line Interface then copy the file to the current working directory

::

aws s3 cp s3://reftransdata/Homo_sapiens_UCSC_hg19_small.tar.gz


open the file and change name to match the one obtained from igenome
--------------------------------------------------------------------
::

tar -zxvf Homo_sapiens_UCSC_hg19_small.tar.gz
mv Homo_sapiens2 Homo_sapiens

31 changes: 31 additions & 0 deletions refTrans/index.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
===================================================
Reference based RNAseq assembly (refTrans) protocol
===================================================

:author: Tamer A. Mansour and Titus Brown

This is a protocol for reference based transcriptome assembly.
Materials are based on and extending two workshops given at MSU by Titus Brown and Matt Scholz on Dec 4/5 ad Dec 10/11.
Both workshops were sponsored by `iCER <http://icer.msu.edu/>`__
and made use of the `MSU High Performance Compute Center <http://hpcc.msu.edu/>`__ .

.. figure:: files/model-rnaseq-pipeline.png

Tutorials:

.. toctree::
:maxdepth: 2

m-resources
m-quality
m-tophat
m-count
m-data-analysis
m-func-analysis
m-advice

Reference material
------------------
:doc:`../docs/command-line`

.. :doc:`../amazon/index`
Loading