diff --git a/callvariants/0-download-and-save.txt b/callvariants/0-download-and-save.txt new file mode 100644 index 0000000..5987706 --- /dev/null +++ b/callvariants/0-download-and-save.txt @@ -0,0 +1,123 @@ +=========================================== +0. Downloading and Saving Your Initial Data +=========================================== + +We're going to do variant calling completely in the cloud, +because that way (a) you don't need to buy a big computer, and (b) +I don't have to figure out all the special details of your own +computer system. + +This does mean that the first thing you need to do is get your data +over to the cloud. I tend to just store it there in the first place, +because... + +The basics +---------- + +... Amazon is happy to rent disk space to you, in addition to compute time. +They'll rent you disk space in a few different ways, but the way that's +most useful for us is through what's called Elastic Block Store. This +is essentially a hard-disk rental service. + +There are two basic concepts -- "volume" and "snapshot". A "volume" can +be thought of as a pluggable-in hard drive: you create an empty volume of +a given size, attach it to a running instance, and voila! You have extra +hard disk space. Volume-based hard disks have two problems, however: +first, they cannot be used outside of the "availability zone" they've +been created in, which means that you need to be careful to put them +in the same zone that your instance is running in; and they can't be shared +amongst people. + +Snapshots, the second concept, are the solution to transporting and +sharing the data on volumes. A "snapshot" is essentially a frozen +copy of your volume; you can copy a volume into a snapshot, and a +snapshot into a volume. + +Getting started +--------------- + +Run through :doc:`../amazon/index` once, to get the hang of +the mechanics. Essentially you create a disk; attach it; format it; copy things +to and from it. + +Downloading and saving your data to a volume +-------------------------------------------- + +There are *many* different ways of getting big sequence files to and +from Amazon. The two that I mostly use are 'curl', which downloads +files from a Web site URL; and 'ncftp', which is a robust FTP client +that let's you get files from an FTP site. Sequencing centers almost +always make their data available in one of these two ways. + +.. note:: + + To use ncftp on your Amazon instance, you may need to install it:: + + apt-get -y install ncftp + +For example, to retrieve a file from an FTP site, you would do something +like:: + + cd /mnt + ncftp -u ftp://path/to/FTP/site + +use 'cd' to find the right directory, and then:: + + >> mget * + +to download the files. Then type 'quit'. + +You can also use 'curl' to download files one at a time from Web or FTP sites. +For example, to save a file from a website, you could use:: + + cd /mnt + curl -O http://path/to/file/on/website + +Once you have the files, figure out their size using 'du -sk' (e.g. after the +above, 'du -sk /mnt' will tell you how much data you have saved under /mnt), +and go create and attach a volume (see :doc:`../amazon/index`). + +Any files in the '/mnt' directory will be lost when the instance is stopped or +rebooted. However, files stored in the root, '/', directory will remain +available. Thus, it's a good rule of thumb to do "savepoints" -- whenever you +complete a big chunk of work, think about saving the data at that point. I've +broken the mRNAseq tutorial down into chunks of work whereyou can do this -- +after each Web page, basically. To sync a folder to attached volume simply +type:: + + rsync -av folder_to_keep /path_to_volume + +Some test data +-------------- + +Several journals require that the Illumina sequencing data accompanying a publication should be deposited in publicly available libraries such as Sequence Read Archive (SRA). +Lets use one of the datasets from SRA as our test data. The data can be downloaded using the ftp link ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/read +s/ByExp/litesra/SRX/SRX225/SRX225038/SRR671724/SRR671724.lite.sra. To get fastq files from sra file, you'd need to install SRAToolkit from http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software, and use fastq-dump function in the toolkit. + +Alternatively, the fastq files can be downloaded directly from European Nucleotide Archive. The paired files are ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR671/SRR671851/SRR671851_2.fastq.gz, and ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR671/SRR671851/SRR671851_1.fastq.gz. + + +Lets make a new directory to store data: +:: + + mkdir data + cd data + wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR671/SRR671851/SRR671851_1.fastq.gz + wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR671/SRR671851/SRR671851_2.fastq.gz + +This dataset contains paired-end Illumina HiSeq data from a clinical isolate of Mycobacterium tuberculosis. The paper is: Zhang et al 2013. Genome sequencing of 161 Mycobacterium tuberculosis isolates from China identifies genes and intergenic regions associated with drug resistance. Nature Genetics, http://dx.doi.org/10.1038/ng.2735. + +Now lets save the reference genome for Mycobacterium tuberculosis from NCBI: +:: + + wget ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Mycobacterium_tuberculosis_H37Rv_uid170532/NC_018143.fna + +Additional information +---------------------- + +Throughout this protocol we will be using commandline interfaces. There +is a short document explaining the notations used here. (see :doc:`../docs/command-line`) + +---- + +Next: :doc:`1-quality` diff --git a/callvariants/1-quality-control.txt b/callvariants/1-quality-control.txt new file mode 100644 index 0000000..600c303 --- /dev/null +++ b/callvariants/1-quality-control.txt @@ -0,0 +1,426 @@ +================================================ +1. Quality Trimming and Filtering Your Sequences +================================================ + +.. shell start + +Be aware of your space requirements and obtain an +appropriately sized machine ("instance") and storage ("volume"). + + +On the new machine, run the following commands to update the base +software and reboot the machine:: + + apt-get update + apt-get -y install screen git curl gcc make g++ python-dev unzip default-jre \ + pkg-config libncurses5-dev r-base-core r-cran-gplots python-matplotlib\ + sysstat && shutdown -r now + + +Install software +---------------- + +.. clean up previous installs if we're re-running this... + +.. :: + + set -x + set -e + echo Removing previous installs, if any. + rm -fr /root/Trimmomatic-* + rm -f /root/libgtextutils-*.bz2 + rm -f /root/fastx_toolkit-*.bz2 + rm -fr /root/bowtie2* + rm -fr /root/samtools* + +.. :: + + echo Clearing times.out + mv -f /root/times.out /root/times.out.bak + echo 1-quality INSTALL `date` >> /root/times.out + +Install Bowtie2 +:: + + cd /root + curl -L -O 'http://downloads.sourceforge.net/project/bowtie-bio/bowtie2/2.1.0/bowtie2-2.1.0-source.zip?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fbowtie-bio%2Ffiles%2Fbowtie2%2F2.1.0%2F&ts=1365392377&use_mirror=superb-dca3' + mv bowtie2-2.1.0-source.zip* bowtie2-2.1.0-source.zip + unzip bowtie2-2.1.0-source + cd bowtie2-2.1.0 + make + cp bowtie2* /usr/local/bin + +Install Samtools +:: + + cd /root + curl -O -L http://sourceforge.net/projects/samtools/files/samtools/0.1.18/samtools-0.1.18.tar.bz2 + tar xvfj samtools-0.1.18.tar.bz2 + cd samtools-0.1.18 + make + +Install `FastQC `__:: + + cd /usr/local/share + curl -O http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.10.1.zip + unzip fastqc_v0.10.1.zip + chmod +x FastQC/fastqc + +Install `Trimmomatic `__ : +:: + + cd /root + curl -O http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.30.zip + unzip Trimmomatic-0.30.zip + cd Trimmomatic-0.30/ + cp trimmomatic-0.30.jar /usr/local/bin + cp -r adapters /usr/local/share/adapters + +Install `libgtextutils and fastx `__ : +:: + + cd /root + curl -O http://hannonlab.cshl.edu/fastx_toolkit/libgtextutils-0.6.1.tar.bz2 + tar xjf libgtextutils-0.6.1.tar.bz2 + cd libgtextutils-0.6.1/ + ./configure && make && make install + + cd /root + curl -O http://hannonlab.cshl.edu/fastx_toolkit/fastx_toolkit-0.0.13.2.tar.bz2 + tar xjf fastx_toolkit-0.0.13.2.tar.bz2 + cd fastx_toolkit-0.0.13.2/ + ./configure && make && make install + +In each of these cases, we're downloading the software -- you can use +google to figure out what each package is and does if we don't discuss +it below. We're then unpacking it, sometimes compiling it (which we +can discuss later), and then installing it for general use. + +* ASK: what is libgtextutils for? + +Find your data +-------------- + +If you downloaded the fastq files using ftp as in previous step, they should be in 'data' directory. The files usually have the '.fastq.gz' suffix. + +If you see all the files you think you should, good! Otherwise, debug. + +Link your data into a working directory +--------------------------------------- + +Rather than *copying* the files into the working directory, let's just +*link* them in -- this creates a reference so that UNIX knows where to +find them but doesn't need to actually move them around. : +:: + + cd /mnt + mkdir -p work + cd work + + ln -fs ../data/* . + +(The 'ln' command does the linking, and should be provided the full path for the data directory.) + +Now, do an 'ls' to list the files. If you don't see all the files in data directory, +then the ln command above didn't work properly. One possibility is that +your files aren't in ../data. This will link all the files in ../data (*.fastq.gz and the reference genome). + +.. note:: + + This protocol takes many hours to run, so you might not want + to run it on all the data the first time. If you're using the + example data, you can work with a subset of it by running this command + INSTEAD of the `ln -fs` command above (you should be in the 'work' directory):: + + for file in ../data/*.fastq.gz + do + gunzip -c $file | head -400000 | gzip > $(basename $file) + done + + This will pull out the first 100,000 reads of each file (4 lines per record) + and put them in the current directory, which should be /mnt/work. + +If you follow the above option (copying only first 100,000 reads to the 'work' directory), you should link the reference genome that is in the 'data' directory: + +:: + + ln -fs ../data/NC_018143.fna + +For this example, lets stick to the option of using the first 100,000 reads. The 'ls' command in the current work directory should now list these three files: + +NC_018143.fna SRR671851_1.fastq.gz SRR671851_2.fastq.gz + +Evaluate the quality of your files with FastQC +---------------------------------------------- + +Lets use FastQC to look at the quality of your sequences:: + +:: + + mkdir fastqc_output + for file in ./*.fastq.gz + do + fastqc $file --outdir=fastqc_output + done + +In mnt/work/fastqc_output/, you should now have the following files: + +SRR671851_1_fastqc.html SRR671851_1_fastqc.zip SRR671851_2_fastqc.html SRR671851_2_fastqc.zip + +Double click on the *.html files to load them into your browser. For our example, the fastqc html reports look like this: + +.. image:: fastqc_screenshot_raw_1.png + :scale: 50 % + :alt: SRR671851_1_fastqc + :align: left + + +.. image:: fastqc_screenshot_raw_2.png + :scale: 50 % + :alt: SRR671851_2_fastqc + :align: left + +As you can see, the base quality (on the y-axis) decreases towards the end of the read, more so in the second read of the pair. To fix this, you can trim the read so as to exclude the poor-quality ends. Base-quality of more than 28 (the green portion in the plot) is a good threshold to decide the length of reads to keep. + +Based on the quality control analysis, we will trim the reads to length 60. But first, we should first remove the Illumina adapters (short nucleotide sequences added prior to sequencing). + +Find the right Illumina adapters +-------------------------------- + +You'll need to know which Illumina sequencing adapters were used for +your library in order to trim them off; do :: + + ls /usr/local/share/adapters/ + +to see which ones are available. Below, we will use the TruSeq3-PE.fa +adapters. + +.. note:: + + You'll need to make sure these are the right adapters for your + data. If they are the right adapters, you should see that some of + the reads are trimmed; if they're not, you won't see anything + get trimmed. + +Adapter trim each pair of files +------------------------------- + +(From this point on, you may want to be running things inside of +screen, so that you detach and log out while it's running; see +:doc:`../amazon/using-screen` for more information.) + +The files containing paired-end reads are labled as _1.fastq.gz and _2.fastq.gz. The briefly describes the file (or is an ID, for example "SRR671851" in our case) and is exactly same for the paired read files. + +For *each* of these pairs, do the following: + +* ASK: where would Illumina adapters be for a person trying this for the first time. + +:: + + # make a temp directory + mkdir trim + cd trim + + # run trimmomatic + java -jar /usr/local/bin/trimmomatic-0.30.jar PE ../SRR671851_1.fastq.gz ../SRR671851_2.fastq.gz s1_pe s1_se s2_pe s2_se ILLUMINACLIP:/usr/local/share/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:20:30 AVGQUAL:28 + + # compress the files and save it in the working directory + gzip -9c s1_pe > ../SRR671851_1.pe.fq.gz + gzip -9c s1_se > ../SRR671851_1.se.fq.gz + gzip -9c s2_pe > ../SRR671851_2.pe.fq.gz + gzip -9c s2_se > ../SRR671851_2.se.fq.gz + + # go back up to the working directory and remove the temp directory + cd .. + rm -r trim + + # make it hard to delete the files you just created + chmod u-w *.pe.fq.gz *.se.fq.gz + +To get a basic idea of what's going on, please read the '#' comments +above, but, briefly, this set of commands: + +* creates a temporary directory, 'trim/' + +* runs 'Trimmomatic' in that directory to trim off the adapters, and then + puts remaining pairs (most of them!) in s1_pe and s2_pe, and any orphaned + singletons in s1_se and s2_se. 's1' and 's2' are short for 1st and 2nd read of a pair, 'pe' stands for paired-reads, and 'se' stands for singleton reads. + +* 'Trimmomatic' also does a 'sliding window trimming approach' (optional): if the quality in window of 20 drops below average score of 30, then the read is trimmed. AVGQUAL is the average quality score below which the read is dropped. LEADING and TRAILING are the number of bases to be trimmed from beginning and end of reads respectively, if the quality score drops below threshold (here, 30). Using 'Trimmomatic' to do quality control ensures that the number of paired reads in s1_pe and s2_pe are same: if any one read in a pair is dropped due to poor quality, its pair is considered as an orphan. This is important becuase several downstream tools (such as mappers) require that the number of reads in paired-end files be exactly the same. + +* Puts the trimmed reads back in the working directory + +Running trimmomatic gave the following (last two lines of output): + +.. note:: + Input Read Pairs: 100000 Both Surviving: 97259 (97.26%) Forward Only Surviving: 1908 (1.91%) Reverse Only Surviving: 628 (0.63%) Dropped: 205 (0.20%) + TrimmomaticPE: Completed successfully + +This means that of the 100,000 reads, 97.26% paired-reads survived the trimmomatic quality-control, 1.91% reads lost the 2nd read of the pair, 0.63% lost the 1st read of the pair, and 0.20% pairs were dropped. + +At the end of this you will have new files ending in '.pe.fq.gz' and +'.se.fq.gz', representing the paired and orphaned adapter trimmed +reads, respectively. + +Quality trim your reads: dropping the low-quality ends +------------------------------------------------------ + +If you are following this example, you should have a bunch of '.pe.fq.gz' files and +a bunch of '.se.fq.gz' files, and html FastQC reports to determine the quality of data in these files. If your individual fastq files need trimming, use the following syntax: + + gunzip -c <'.fq.gz' file> | fastx_trimmer -l -z -o <'.qc.fq.gz' outfile> + +This first uncompresses the .fq.gz files, trims the reads to specified length, and -z compresses output with gzip. + +.. note:: + If you have several files to run FastQC on, you can unzip the '_fastqc.zip' files, and look for 'summary.txt' file in the unzipped folders. If you see 'FAIL' or 'WARN' next to 'Per base sequence quality' in this 'summary.txt', it is a good idea to look at the '_fastqc.html' files individually and trim the ends, or decide if the data quality isn't good enough for downstream analyses, and if not, you may not want to use that file. + + A simple script can be written to identify the .fq.gz files that FAIL the 'Per base sequence quality' test, and will narrow down the number of FastQC html reports to look at. + +Automating this step +~~~~~~~~~~~~~~~~~~~~ + +If all .fq.gz files need trimming (as we will in this example), do the following: + +.. :: + + echo 1-quality FILTER `date` >> /root/times.out + +This step can be automated with a 'for' loop at the shell prompt. Try : +:: + + for file in *.pe.fq.gz *.se.fq.gz + do + echo working with $file + newfile="$(basename $file .fq.gz)" + gunzip -c $file | fastx_trimmer -l 60 | gzip -9c > "${newfile}.qc.fq.gz" + done + +What this loop does is: + +* for every file ending in pe.fq.gz and se.fq.gz, + +* print out a message with the filename, + +* uncompresses the original file, passes it through fastx_trimmer, recompresses it, + and saves it as 'newfile'.qc.fq.gz + +Re-evaluate the quality of your files with FastQC +------------------------------------------------- + +Since we did quality-control using Trimmomatic and dropped low-quality ends of reads, lets re-run fastqc to confirm that the data is now good enough for downstream analysis. + +:: + + for file in *.qc.fq.gz + do + fastqc $file --outdir=fastqc_output + done + +This will generate *.qc.fq_fastqc.html files in the fastqc_output directory. + +SRR671851_1.pe.qc.fq_fastqc.html: + +.. image:: fastqc_qc_1_pe.png + :scale: 50 % + :alt: fastqc_qc_1_pe + :align: left + +SRR671851_1.se.qc.fq_fastqc.html + +.. image:: fastqc_qc_1_se.png + :scale: 50 % + :alt: fastqc_qc_1_se.png + :align: left + +SRR671851_2.pe.qc.fq_fastqc.html + +.. image:: fastqc_qc_2_pe.png + :scale: 50 % + :alt: fastqc_qc_2_pe + :align: left + +SRR671851_2.se.qc.fq_fastqc.html + +.. image:: fastqc_qc_2_se.png + :scale: 50 % + :alt: fastqc_qc_2_se.png + :align: left + +The data now looks good (base-quality scores for all four files lie in the green region)! + +Finishing up +------------ + +You should now have a bunch of files in the working directory: + + *_1.fastq.gz - the original data + *_2.fastq.gz + *_1.pe.fq.gz, *_2.pe.fq.gz - adapter trimmed and filtered pe + *_1.se.fq.gz, *_2.se.fq.gz - adapter trimmed and filtered se + *.qc.fq.gz - FASTX trimmed files + +Yikes! What to do? + +Well, first, you can get rid of the original data. You already have it on a +disk somewhere, right? : +:: + + rm *.fastq.gz + +Next, you can get rid of the 'pe.fq.gz' and 'se.fq.gz' files, since you +only want the QC files. So : +:: + + rm *.pe.fq.gz *.se.fq.gz + +Things to think about +~~~~~~~~~~~~~~~~~~~~~ + +Note that the filenames, while ugly, are conveniently structured with the +history of what you've done. This is a good idea. + +Also note that we've conveniently named the files so that we can remove +the unwanted ones en masse. This is a good idea, too. + +And finally, make the end product files read-only : +:: + + chmod u-w *.qc.fq.gz + +to make sure you don't accidentally delete something. + +Saving the files +---------------- + +At this point, you should save these files:: + + mkdir save + cp *.qc.fq.gz save + du -sk save + +If you are running with a data subset (as in this example), do +:: + + cp /mnt/work/*.qc.fq.gz ../data + +to save the QC files for later use. + +.. shell stop + +This puts the data you want to save into a subdirectory named 'save', and +calculates the size. + +Now, create a volume of the given size -- divide by a thousand to get +gigabytes, multiply by 1.1 to make sure you have enough room, and then +follow the instructions in :doc:`../amazon/index`. Once +you've mounted it properly (I would suggest mounting it on /save +instead of /data!), then do :: + + rsync -av save /save + +which will copy all of the files over from the ./save directory onto the +'/save' disk. Then 'umount /save' and voila, you've got a copy of the files! + +Next stop: :doc:`2-mapping`. + diff --git a/callvariants/2-mapping.txt b/callvariants/2-mapping.txt new file mode 100644 index 0000000..bb0bf2f --- /dev/null +++ b/callvariants/2-mapping.txt @@ -0,0 +1,97 @@ +======================================== +2. Mapping reads to the reference genome +======================================== + +Index the reference genome +-------------------------- +First step in mapping reads is to generate an index for the reference genome. Note that we are still in 'work' directory. + +The reference genome for our example is the Mycobacterium tuberculosis genome H37Rv. The fasta file for this genome is NC_018143.fna, that we "linked" to our 'work' directory. + +.. shell start + +.. :: + + set -x + set -e + + +We will be using bowtie2 to map the reads to this reference genome. So lets index the reference genome using bowtie2-build. + +.. note:: + bowtie2-build -f + + +Create the index as follows: +:: + + bowtie2-build -f NC_018143.fna H37Rv + +where, H37Rv is the basename that is now assigned to the index files. The reference index files created in the 'work' directory are: + +H37Rv.1.bt2 H37Rv.3.bt2 H37Rv.rev.2.bt2 +H37Rv.2.bt2 H37Rv.4.bt2 H37Rv.rev.1.bt2 + + +Link in your data +----------------- + +Make sure your data is in /mnt/work/. You can do:: + + cd /mnt + mkdir -p work + cd /mnt/work + ln -fs /data/*.qc.fq.gz . + +If you are following this example, then your files should already be in 'work'. + +Map reads to genome +------------------- + +Since we are using bowtie2, map the reads to the reference using this command: + +.. note:: + + bowtie2 -x -1 <'_1.pe.qc.fq' file> -2 <'_2.pe.qc.fq' file> -U <'.se.qc.fq' files> -S + +First pair of the paired-end reads go after "-1" in a comma separated format, the second pair of paired-end reads go after "-2", and any unpaired/orphan reads go after "-U", again in comma separated format. The output SAM filename is provided after "-S". Notice that: + +1. The .fq.gz files have to be uncompressed to .fq. This can be done using 'gunzip -c '.fq.gz' > '.fq'. If your files are huge and you do not want to uncompress them, you can use zcat (short for gunzip -c) as follows: + +bowtie2 -x -U <( zcat '.fq.gz' ) -S + +i.e., the .fq.gz filename should be replaced by '<( zcat .fq.gz )'. + +2. If you are supplying paired-end reads via flags "-1" and "-2", the number of reads in the paired files should be exactly the same. + +Lets run bowtie2 on our example dataset! + +:: + + bowtie2 -x H37Rv -1 SRR671851_1.pe.qc.fq -2 SRR671851_2.pe.qc.fq -U SRR671851_1.se.qc.fq,SRR671851_2.se.qc.fq -S SRR671851_mapped.sam + +Notice that the orphan .se.qc.fq files are supplied under the "unpaired" -U flag. I see the following output: + +99795 reads; of these: + 97259 (97.46%) were paired; of these: + 1844 (1.90%) aligned concordantly 0 times + 93516 (96.15%) aligned concordantly exactly 1 time + 1899 (1.95%) aligned concordantly >1 times + ---- + 1844 pairs aligned concordantly 0 times; of these: + 481 (26.08%) aligned discordantly 1 time + ---- + 1363 pairs aligned 0 times concordantly or discordantly; of these: + 2726 mates make up the pairs; of these: + 1601 (58.73%) aligned 0 times + 746 (27.37%) aligned exactly 1 time + 379 (13.90%) aligned >1 times + 2536 (2.54%) were unpaired; of these: + 36 (1.42%) aligned 0 times + 2402 (94.72%) aligned exactly 1 time + 98 (3.86%) aligned >1 times +99.17% overall alignment rate + +It looks like the reads mapped! Success! Now lets find out the SNPs in the mapped reads. + +Next stop: :doc:`3-variant-calling`. diff --git a/callvariants/3-variant-calling.txt b/callvariants/3-variant-calling.txt new file mode 100644 index 0000000..5d04b62 --- /dev/null +++ b/callvariants/3-variant-calling.txt @@ -0,0 +1,93 @@ +================== +3. Variant Calling +================== + +We will use SAMTools to call SNPs from the mapped reads. + +Index the reference genome using SAMTools +----------------------------------------- + +First, we need to generate an index for the reference (H37Rv) with samtools (this index is different than the one we generated using bowtie2-build): + +.. note:: + samtools faidx + +For our example, this will be: +:: + + samtools faidx H37Rv.fa + +This generates a file 'NC_018143.fna.fai' in the current directory. + +Convert SAM to BAM format and sort & index the BAM file +------------------------------------------------------- + +Lets generate the binary version of the SAM format, which takes less space and is accepted by most software for downstream analyses. + +.. note:: + + samtools view -bt reference.fa.fai aln.sam > aln.bam + +For our example, this will be: +:: + samtools view -bt NC_018143.fna.fai SRR671851_mapped.sam > SRR671851_mapped.bam + +Now lets sort and index the bam file: + +.. note:: + + samtools sort aln.bam aln.sorted + samtools index aln.sorted.bam + +The 'samtools sort' command creates a new file aln.sorted.bam, and the 'samtools index' indexes this aln.sorted.bam file and creates an index file aln.sorted.bam.bai. + +Lets run these commands for our example data: +:: + + samtools sort SRR671851_mapped.bam SRR671851_mapped.sorted + samtools index SRR671851_mapped.sorted.bam + +'samtools idxstats' can be used to determine number of reads that mapped to the reference genome. For example: +:: + + samtools idxstats SRR671851_mapped.sorted.bam + +outputs: +gi|561108321|ref|NC_018143.2| 4411709 195422 608 +* 0 0 1024 + +where 'gi|561108321|ref|NC_018143.2|' is the reference name, 4411709 is the reference genome length, 195422 is the number of reads mapped to the reference, and 608 is the number of unmapped reads whose mate mapped to the reference. Other unmapped reads (orphans, or mates where both reads didn't map) are output in the second line (here, 1024 such reads). + +Call SNPs +--------- + +SAMTools can be used to call variants in the mapped reads as follows: +.. note:: + + samtools mpileup -uDgf ref.fa aln.bam | bcftools view -bvcg - > var.raw.bcf + bcftools view var.raw.bcf > var.raw.vcf + +For our example, this will be: +:: + + samtools mpileup -uDgf NC_018143.fna SRR671851_mapped.sorted.bam | bcftools view -bvcg - > var.raw.bcf + bcftools view var.raw.bcf > var.raw.vcf + +Option -u tells samtools to output in uncompressed bcf format, -D tells it to use the default read-depth (8000, but this can be changed by supplying a number after -D), option -g tells it to compute genotype likelihoods and write them to the bcf file, -f tells it that the file following the flag is the reference fasta file. + +Mpileup looks at read bases that map to each position in the reference genome and thus can identify mismatches (SNPs), indels, and so forth. Bcftools calls variants from mpileup output by bayesian inference, and outputs only the variant sites in bcf format. 'bcf' is short for binary call format, and 'bcftools view' converts that to human-readable variant call format (vcf): this file contains the variant calls (SNPs/indels identified in the mapped reads). For our example, the first three lines after the comments (lines beginning with '##') in the var.raw.vcf looks like: + +.. note:: + #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRR671851_mapped.sorted.bam + gi|561108321|ref|NC_018143.2| 1849 . C A 68 . DP=3;VDB=6.363195e-02;AF1=1;AC1=2;DP4=0,0,2,1;MQ=37;FQ=-36 GT:PL:DP:GQ 1/1:100,9,0:3:16 + gi|561108321|ref|NC_018143.2| 1977 . A G 12.3 . DP=1;AF1=1;AC1=2;DP4=0,0,0,1;MQ=42;FQ=-30 GT:PL:DP:GQ 1/1:42,3,0:1:5 + +1st column is the reference genome (or chromosome), 2nd column is the genome position in the reference, 4th column is the base in the reference genome, 5th column is the SNP bases separated by comma (a single SNP base in our example here). + +The INFO string (8th column) gives you information about the variant call: + DP is the number of reads that mapped to this position. + DP4 is a better measure than DP, because it tells "high-quality" number of reference and alternate (SNP) bases at that position. In the 2nd line above: DP4=0,0,2,1 means that there were 0 bases that matched the reference base (on forward and reverse reads), and 3 bases that matched the SNP (2 on forward reads, 1 on reverse read). + MQ is the root-mean square mapping quality of the reads that mapped. Here, it is 37 in the 2nd line, which is pretty good. + + + diff --git a/callvariants/SRR671851_1.pe.qc.fq_fastqc.html b/callvariants/SRR671851_1.pe.qc.fq_fastqc.html new file mode 100644 index 0000000..79def06 --- /dev/null +++ b/callvariants/SRR671851_1.pe.qc.fq_fastqc.html @@ -0,0 +1,187 @@ +SRR671851_1.pe.qc.fq.gz FastQC Report
FastQCFastQC Report
Wed 22 Apr 2015
SRR671851_1.pe.qc.fq.gz

[OK]Basic Statistics

MeasureValue
FilenameSRR671851_1.pe.qc.fq.gz
File typeConventional base calls
EncodingIllumina 1.5
Total Sequences97259
Sequences flagged as poor quality0
Sequence length13-60
%GC64

[OK]Per base sequence quality

Per base quality graph

[OK]Per tile sequence quality

Per base quality graph

[OK]Per sequence quality scores

Per Sequence quality graph

[FAIL]Per base sequence content

Per base sequence content

[OK]Per sequence GC content

Per sequence GC content graph

[OK]Per base N content

N content graph

[WARN]Sequence Length Distribution

Sequence length distribution

[OK]Sequence Duplication Levels

Duplication level graph

[OK]Overrepresented sequences

No overrepresented sequences

[OK]Adapter Content

Adapter graph

[WARN]Kmer Content

Kmer graph

SequenceCountPValueObs/Exp MaxMax Obs/Exp Position
GACAACA700.00647405818.74691828
\ No newline at end of file diff --git a/callvariants/SRR671851_1.se.qc.fq_fastqc.html b/callvariants/SRR671851_1.se.qc.fq_fastqc.html new file mode 100644 index 0000000..c65f8ac --- /dev/null +++ b/callvariants/SRR671851_1.se.qc.fq_fastqc.html @@ -0,0 +1,187 @@ +SRR671851_1.se.qc.fq.gz FastQC Report
FastQCFastQC Report
Wed 22 Apr 2015
SRR671851_1.se.qc.fq.gz

[OK]Basic Statistics

MeasureValue
FilenameSRR671851_1.se.qc.fq.gz
File typeConventional base calls
EncodingIllumina 1.5
Total Sequences1908
Sequences flagged as poor quality0
Sequence length14-60
%GC65

[OK]Per base sequence quality

Per base quality graph

[OK]Per tile sequence quality

Per base quality graph

[OK]Per sequence quality scores

Per Sequence quality graph

[FAIL]Per base sequence content

Per base sequence content

[OK]Per sequence GC content

Per sequence GC content graph

[OK]Per base N content

N content graph

[WARN]Sequence Length Distribution

Sequence length distribution

[OK]Sequence Duplication Levels

Duplication level graph

[OK]Overrepresented sequences

No overrepresented sequences

[OK]Adapter Content

Adapter graph

[OK]Kmer Content

No overrepresented Kmers

\ No newline at end of file diff --git a/callvariants/SRR671851_1_fastqc.html b/callvariants/SRR671851_1_fastqc.html new file mode 100644 index 0000000..7188329 --- /dev/null +++ b/callvariants/SRR671851_1_fastqc.html @@ -0,0 +1,187 @@ +SRR671851_1.fastq.gz FastQC Report
FastQCFastQC Report
Wed 22 Apr 2015
SRR671851_1.fastq.gz

[OK]Basic Statistics

MeasureValue
FilenameSRR671851_1.fastq.gz
File typeConventional base calls
EncodingIllumina 1.5
Total Sequences100000
Sequences flagged as poor quality0
Sequence length75
%GC65

[OK]Per base sequence quality

Per base quality graph

[OK]Per tile sequence quality

Per base quality graph

[OK]Per sequence quality scores

Per Sequence quality graph

[FAIL]Per base sequence content

Per base sequence content

[OK]Per sequence GC content

Per sequence GC content graph

[OK]Per base N content

N content graph

[OK]Sequence Length Distribution

Sequence length distribution

[OK]Sequence Duplication Levels

Duplication level graph

[OK]Overrepresented sequences

No overrepresented sequences

[OK]Adapter Content

Adapter graph

[WARN]Kmer Content

Kmer graph

SequenceCountPValueObs/Exp MaxMax Obs/Exp Position
TTGGGTT150.002340611369.0640644
ACGCGAA200.00734161151.74624617
TAAGTGG200.00734161151.7462467
CCGTACG200.00734161151.74624630
TGGGTTG400.002573312734.5320325
GGCTCGC700.001325914924.64107149
TGGTCGA1400.00409785514.78464349
\ No newline at end of file diff --git a/callvariants/SRR671851_1_per_base_quality.png b/callvariants/SRR671851_1_per_base_quality.png new file mode 100755 index 0000000..3754b04 Binary files /dev/null and b/callvariants/SRR671851_1_per_base_quality.png differ diff --git a/callvariants/SRR671851_2.pe.qc.fq_fastqc.html b/callvariants/SRR671851_2.pe.qc.fq_fastqc.html new file mode 100644 index 0000000..44e5457 --- /dev/null +++ b/callvariants/SRR671851_2.pe.qc.fq_fastqc.html @@ -0,0 +1,187 @@ +SRR671851_2.pe.qc.fq.gz FastQC Report
FastQCFastQC Report
Wed 22 Apr 2015
SRR671851_2.pe.qc.fq.gz

[OK]Basic Statistics

MeasureValue
FilenameSRR671851_2.pe.qc.fq.gz
File typeConventional base calls
EncodingIllumina 1.5
Total Sequences97259
Sequences flagged as poor quality0
Sequence length10-60
%GC64

[OK]Per base sequence quality

Per base quality graph

[OK]Per tile sequence quality

Per base quality graph

[OK]Per sequence quality scores

Per Sequence quality graph

[FAIL]Per base sequence content

Per base sequence content

[OK]Per sequence GC content

Per sequence GC content graph

[OK]Per base N content

N content graph

[WARN]Sequence Length Distribution

Sequence length distribution

[OK]Sequence Duplication Levels

Duplication level graph

[OK]Overrepresented sequences

No overrepresented sequences

[OK]Adapter Content

Adapter graph

[OK]Kmer Content

No overrepresented Kmers

\ No newline at end of file diff --git a/callvariants/SRR671851_2.se.qc.fq_fastqc.html b/callvariants/SRR671851_2.se.qc.fq_fastqc.html new file mode 100644 index 0000000..e497101 --- /dev/null +++ b/callvariants/SRR671851_2.se.qc.fq_fastqc.html @@ -0,0 +1,187 @@ +SRR671851_2.se.qc.fq.gz FastQC Report
FastQCFastQC Report
Wed 22 Apr 2015
SRR671851_2.se.qc.fq.gz

[OK]Basic Statistics

MeasureValue
FilenameSRR671851_2.se.qc.fq.gz
File typeConventional base calls
EncodingIllumina 1.5
Total Sequences628
Sequences flagged as poor quality0
Sequence length14-60
%GC64

[OK]Per base sequence quality

Per base quality graph

[OK]Per tile sequence quality

Per base quality graph

[OK]Per sequence quality scores

Per Sequence quality graph

[FAIL]Per base sequence content

Per base sequence content

[WARN]Per sequence GC content

Per sequence GC content graph

[OK]Per base N content

N content graph

[WARN]Sequence Length Distribution

Sequence length distribution

[OK]Sequence Duplication Levels

Duplication level graph

[WARN]Overrepresented sequences

SequenceCountPercentagePossible Source
CGGGCGCGTAAACCACGCGCATGCTTGGTTACTCG10.15923566878980894No Hit
CGGGCCGGGCGCACAGCTGGACGGCCCGGAGATCATGACCGACATGTTCGTCCGCGACAT10.15923566878980894No Hit
CGACGCTTTGGTAGCGCCCGAATTCCTGCAGCGCCTTACCTGTCACCGGCTCCAGCTCGA10.15923566878980894No Hit
TTGTCGGCATAGCTGGTGGCGGCCATCTCGAATT10.15923566878980894No Hit
CGGACGAGTCGATGTGCTGGCATATCTCGTCGTTGGTGACCACGCGTTCGGGCCGGTACG10.15923566878980894No Hit
CACTCGCCGCGGGCGATGAGAACATCACAAAGTAGGTCCGCGCGATCGGTGATGATGCCG10.15923566878980894No Hit
CGTCGCGCGACAACTTGGTGTCGTGAGCTGCTCACCGCGACAGGACGATCCAGCGACTTC10.15923566878980894No Hit
CTACCACTCCCGCAAGGTCGCACCCGGCGCGAACCATCTCAGCCGTCGCGTTCGGGTCGG10.15923566878980894No Hit
TACGCCGGTACTACGGCGGTCGAATTGCCCACCTGG10.15923566878980894No Hit
AGTCCACCCAGCACACTGTCGAACAGGATGCTGCCGCGCGAATACCCCAGTCGGGTTTGA10.15923566878980894No Hit
GGACGACGATACGCAACGGGCGC10.15923566878980894No Hit
ACAAACAACGGAATAAAGAGGCCAACCATTAAACCCATGTCATGGTAAAGCGGCAACCAC10.15923566878980894No Hit
CAACATTTTTCATTGGACTCCCTGCCCCCGCACGCACCGCGGGCCAAGCTATGCAGCTGC10.15923566878980894No Hit
TGTCTATCAGGATCCCGACCCCAGTGA10.15923566878980894No Hit
GACGAAACCCGAAATGTTGGCCGTACCGGTGATCGTGGGTACCGCGTTTCTGAAGCCCG10.15923566878980894No Hit
CACTCGGGCCGCCCCGTAGTTGGCGATGACAAA10.15923566878980894No Hit
GGATTCCCGACCCATCACGATCGG10.15923566878980894No Hit
TTGTTGTCCTGGTGAAGTGGGGTTCTGGTGACTGGAAACGCAAAGCTAATTGATCGAGTC10.15923566878980894No Hit
CGCGGGCGTTCTTGATCGCCTGG10.15923566878980894No Hit
CGATCCGAATCGATTGCTGCCGACGGTGTCTGGCTGGTTTCGACGGTCATCGCG10.15923566878980894No Hit
GCCAGGCTTGCAATGAAAATCCA10.15923566878980894No Hit
TTTGCCACATTGCAGCCGGACGCCCATTTCGGTCGGGTCTTGGACGCCTACGGTGGTGTG10.15923566878980894No Hit
CCCGTACAGCCGCGCGCACATACCGGTCAAGACCTCGATCATGTCACACACCAGATCATC10.15923566878980894No Hit
CTCTTGCTCAACACATTGTCGAATCACACACCGGTCCAATAGAATACGCCGCTAC10.15923566878980894No Hit
ATGTTCGTTCTCGAAACATCGCTGCGAATCCGGGTCACCGTGCACCTGGATCACCGGGAC10.15923566878980894No Hit
CGGGAACGCATCTTCTACGGATCCGACGAA10.15923566878980894No Hit
GAGCGCGCCCGAAAATGGGCCGATTACGTTGTTACCGGTC10.15923566878980894No Hit
CATCGGCGCCACCCTCCAGCAGATGCG10.15923566878980894No Hit
CATATTCGCCGGACAACGATCCGGCGAATATGTGGCAAGAGCGTGCCTACGCGGTAGGCG10.15923566878980894No Hit
GGGTTTGACGCCGACGGTCGCGGTCAGCGCGGCCGGCTGGGGG10.15923566878980894No Hit
GATCGAGGGTCCGGACTACCGACACCGTGATCTGCCGCCGGCACCGGCGCCGCTGTCCGA10.15923566878980894No Hit
TACGTACCCGGTTCGGCACCCTGACCGCGGTGGACCGGCCGGTGGGC10.15923566878980894No Hit
CGGACTACTTACCCATTCTGCCGACCGGGC10.15923566878980894No Hit
TGCAGATTCGTTACGCCCTGGGCCACGGAGACGTCATAGGTCAGGCTATTCAGCCCGGTG10.15923566878980894No Hit
AGTGTTGGCGATGCCGGCGTTGAAGAAGCCCGTGTTGGCTTGGCCCGTGTTGCCGATACC10.15923566878980894No Hit
AGTTGGTCACGCAGCACCGAATCGGACAGCAGTT10.15923566878980894No Hit
AGTGTCGAAGTCGTGTACTCCGACCCGCTGCGCGAGGCGATCGCAGAAGACGAGCAGCTG10.15923566878980894No Hit
AAGGACATGCGTCACAGCGATCCCTCCTGACACAGACGTTATGGGCAATCAGGCCCCAGC10.15923566878980894No Hit
GCTGCTGGGCGCGGTGAACCCAGAGCGTCTCGACGAACTTGCGATGGACGCAGAATTTTT10.15923566878980894No Hit
ATCGGCACCGGCTTGCAAGGTG10.15923566878980894No Hit
ACCGAATCCGGCGAAA10.15923566878980894No Hit
GGGCGGCGGCGACCATGTTGTGCCGCG10.15923566878980894No Hit
GGTTCCATTTCGCCCTCAGCGGTGTTCTCACTGCCCTTGACGACCTGTTCGGCCAAGTTG10.15923566878980894No Hit
TTACCGTGCCCAACTTCCTGGAGTTGGACTACTGGCA10.15923566878980894No Hit
ACCATGAAGGGGCCCAACCCGGGGTGTGGCCGTTAGCG10.15923566878980894No Hit
GGTACTCGCGCTGATGTTCGCGGCTATATGTGGTGCGCGTTACCTCTT10.15923566878980894No Hit
TCTGCAGCCCGTAGCGGCCACCGATGCGC10.15923566878980894No Hit
GGCCGCGGCTGTGGACACTAACTGTGGGGCCACGTTGACAAAAGACATCGAAATCCTCCT10.15923566878980894No Hit
GCAGATGTTGCGCTTGAACAACAGACGGTCGAG10.15923566878980894No Hit
CCGGCCACCAGCGGAACCAGCGGTGAAACCCACCGCAGCCGCCGTGGCAGCCAC10.15923566878980894No Hit
GTATGCGTCAATTGCTCGTTCGATTTTCAACCCATAATCATT10.15923566878980894No Hit
TAGCAGCCTCTTGAATGCGGTTTCGTGCGGCGCTGAGTAGTCGG10.15923566878980894No Hit
CGGTCAGCACGATTCGGAGTGGGCAGCGATCAGTGAGGTCGCCCGTCTACTTGGTGTTGG10.15923566878980894No Hit
CACCGTGCGCCATTCGTGGCGACAACTGCGAGCGGGAGCGGGACCAAGGATGATGGTCCC10.15923566878980894No Hit
ATGGCCGGCCGATACCATCCGCTGCGCCGCCGGCGTGGCCGCGCGTGCGC10.15923566878980894No Hit
CTACGACGTGTCGCACAACCTGGCCAAGATCGCGACGCATCCGATCGACGGTCAGCTGCG10.15923566878980894No Hit
GAAACCGCCGACGTCGCCCTGATGGGCCAAGACCTGCGGCATCTGCCCCAAGCCCTAGAC10.15923566878980894No Hit
GACAACGGTGGCGGCCGCGGGGGGG10.15923566878980894No Hit
CGACCGCAGCCGGGCGGCGATCCTCTGGCCCTCCGCGGTGGAGACGGC10.15923566878980894No Hit
ATGGGGTCACAACCGTGTTGTTGCCGGGAACCGGATCGGACAACGACTACGTCCG10.15923566878980894No Hit
ATCACCATGAGCGGCATGAGCCACGCCACCGAGTTCATCATGTTGATCGCCGAAAACCAT10.15923566878980894No Hit
CCCTGCAACGCGGCCAGATAGATCACGAAGGTATAGCCGAGGTTCTTCCAGACGTAGGTG10.15923566878980894No Hit
TCGCCGTCGGCGAGCAAGGCCGGGAACACCTCGCGTTCCACCGAAACCTCCCGGCCCTGC10.15923566878980894No Hit
GCAGGTCAGGCTGTGGGCC10.15923566878980894No Hit
GGTACCCCGTCTTCAGTTGGCAGGGCATCACACTGCAAG10.15923566878980894No Hit
CACACCGCAGTGCCCGTGGTCGGTGAACTCGTCCAGACAGGTGTCGGCCATCAACACCGT10.15923566878980894No Hit
GGCGCTACGCTGCACGCGCTTATCACCGGTACCCCGCTGAGCACCGATGAGCTCACCGAT10.15923566878980894No Hit
GTCGCGCAGCAGTGAAAGGAACTTACGGCGGTGCTCGTT10.15923566878980894No Hit
GCATTGGGCCCGACGCCGGAACTTCTGGCAGCCGGTGC10.15923566878980894No Hit
CGGAATTCGATATCCATTGCGGTGGAATGGATTTAGTCTTCCCGCATCATGAGAACGAAA10.15923566878980894No Hit
CGCTGCGGCGGCAAGGAGAAACACGACCCCGGCGATGAGCCACGGCTT10.15923566878980894No Hit
CGGGCAAAGCCGATCTCGTCGTCGC10.15923566878980894No Hit
TGGCACCGATCGGCATATTATACGACTGCGCATCTGATATGTGCTGCCAGTAACTATGAT10.15923566878980894No Hit
CTCGGGTGGGTCCGCTCGAGCGCCGTTAG10.15923566878980894No Hit
GCATTTGGGGGCTATTTCTATTGGGACGACTTGATCCTCGTCGGCAGGGCCGGCACT10.15923566878980894No Hit
ATGTTGCTAGATGTCCTTTCATAGA10.15923566878980894No Hit
TGGGACGACGCCACCTGACGTGCGGCGCGGCG10.15923566878980894No Hit
GTTTAGACATCGATGTCCAGGCGTTGCCCTCCCACGAGC10.15923566878980894No Hit
AACTCTTCGACCACGATCGCGCCACCGCCGATCACGTCGGCCCGGCC10.15923566878980894No Hit
GGGTCATATCCAAGAACCGGTGAGTCCGCCCACCCAACCGGAGCAGCAGCCGCAAACTGA10.15923566878980894No Hit
GGTGCGCCTGCTCGAGATCGAAACGGTCAGCGGGGGGCTGCTGATTTTGTCCCGCTGGGA10.15923566878980894No Hit
GTGAGGTTTCGCGCTGAGGTGCCCAACTCTGATGGCATCAACCCC10.15923566878980894No Hit
TCGACCCCTACCCCGGCAACGACGCGCCGACTACCTCACACCCT10.15923566878980894No Hit
CTCCACCGCGGCTTTCCAGGATGCTTTGACCTTCGGCCGGGATGCGGGCAGCGATGCCAA10.15923566878980894No Hit
CGGTGTGGACGCGCACCATCGACGGATGCGAACGCGCCCCGGCCGCGTGCCGCCCACCAC10.15923566878980894No Hit
CCGCATCGGTGACCCAACCATCCA10.15923566878980894No Hit
CCTGCTCCGCACCATAGGAGTCGAGGACGCCCACGACGTCGCCAACCAATTCCTTGATGC10.15923566878980894No Hit
GACAGCGCAGCGCATCCATGGCCTGCTCGAGCTGCGCGTCGAGGTCCCACGCGTCGGCGT10.15923566878980894No Hit
CCGACGACCTGCTGCACTCTGGAATCTGTGTCATCGACG10.15923566878980894No Hit
GATTATCTGGTGTCGGACTACGACGTCAACTTCCTATCCATCACCGACGACCTGTTTATC10.15923566878980894No Hit
GGCGGAGGTCAAGCGCCGCTCCTCCTCATCGCTGCGCTCTGCATCGTCGCCGGCGGAGGT10.15923566878980894No Hit
GTCGACTCCAGCACCAGCACGGCAACCAGCGAGCCCAGCTGCGCCGAACGCTCCAGGCCT10.15923566878980894No Hit
GCCTCGAACTGGACAAAGGGCTGGGGTGGAAAGCCGAAGTCGAGGCCGCTCTTGAACGCT10.15923566878980894No Hit
GGCTGTCGTGGCATCGGCATTGATGTGGCAGTGGCG10.15923566878980894No Hit
AATCGCATTCATCGCTTGGCGCAGCACGGCATTGAGCTGCTCGGCATTGCCCTCCGGCCG10.15923566878980894No Hit
TCCGACCGAGCTGACCTGGGAGATCTACCGCGACACCGTGATCGAGCAGTGTGAGCAAGG10.15923566878980894No Hit
CGGAATTCCCGCTGTGGCGGCCGTCGCCA10.15923566878980894No Hit
GGACCGCCCGCCGACGGCGTGCTGGCAGCCATCGGTGCCATGACGCGCGATCAGGCG10.15923566878980894No Hit
CATCGCAACTGTGGTGCAGGAAGGTG10.15923566878980894No Hit
TCAGCGATCTCGGCCCCGACGG10.15923566878980894No Hit
AGCGAGCGGCACTATCTGCGGTCTATGCGCTCGGCCGCCGGATCGACGACGTCGCCGACG10.15923566878980894No Hit
CGATTACCCAGCAGTACGTCAAGAACGCGCTGGTCGGTTCCGCACAG10.15923566878980894No Hit
CGACGCCCGAACCGTCACCGACTACACCGACATCCTCGAAGGGGCCGGATTGCGCA10.15923566878980894No Hit
CGACTACCCAATCCCAGGCAACGACGACGCGATCCGCTCGGCCGCGCTGCTGACTAGGG10.15923566878980894No Hit
CCTCCTCGTCGGCGGTCAGGTACACGATCTCGTCGCTAACCACGCCGTCGACCACCTTG10.15923566878980894No Hit
CGCTGGCATCCAGCTTGGCAAGGATGTCCCAGTAGGCCGCGCTGCG10.15923566878980894No Hit
TCTATAGCCCGCCTGAGCTATTCCGGTGCTGTCGGCTAAGCCTGTGACCGGTGTCACTGC10.15923566878980894No Hit
CAGCACCAGATCCTCGGCCGCCACCCGGGACCGCACGTAAGGAGTCACCTTGCGGGTATC10.15923566878980894No Hit
GCAAGGCTCGACCCGACATCGGGCCGCTGTTCTCCGAGC10.15923566878980894No Hit
TCATTGTCATCTTCGCAACCTCCGGTGCTCTGGAAGCACTGGTAACGGCCCGCAC10.15923566878980894No Hit
GATGTCAGACCCGCGCATGAGCAGTGACGAACTGATCGACTACCTGACGATGCTGAGCTG10.15923566878980894No Hit
ATCAGGTTGCGTACGTGCGGATCGCCGCCGTGGCCGG10.15923566878980894No Hit
AATACTCAATCCAGACACAACACCCACAGTTCAGTTGGAG10.15923566878980894No Hit
CCTTGCCATCGTCTTCTCGGCCGACGACGGCTGCGATGTCGGCATGGATTCGGGCTCGCC10.15923566878980894No Hit
AGTGCCGCCGGCCCCGCCGGTCCCAGTGGTAGC10.15923566878980894No Hit
GCCGAGCGGCTTGAGGTGCTTA10.15923566878980894No Hit
AAGTTGCCCCGCACCCGAACCGGGGGGAACTTCGAGTCCTTGGCA10.15923566878980894No Hit
GTCGCCGCGCACCGCAGCACCGGTGAGCTCACGCTGCTAGTGGAGGTG10.15923566878980894No Hit
GAATTCCAGCACCTCGGACACCAGTGAGCGCGCGTTGCCGGCCGACGCGCCGACAGCCTG10.15923566878980894No Hit
CGGGCCCAGCTCGGCGAACTCCTTGC10.15923566878980894No Hit
ATGCAGCATTGGCGCCTCCCATACCTGGTGCGGCGATCCAGGAAGCACCCAGCGTCAGGG10.15923566878980894No Hit
GCTGATTCGCTCGTAGCGCTCGGATGCAC10.15923566878980894No Hit
CGCGCCCGTCGCGGAGCCGACCACTGCGG10.15923566878980894No Hit
GGTGGACAGCCCCCGTCCCGCCGGCGAGGTGACCGGCACCAATCCA10.15923566878980894No Hit
GCGCCCGAGCGTGGCCCCGAGTGTGGGGAGGTGTGATGGCCCCCAAACAGCTGCCCGATG10.15923566878980894No Hit
TGACGACGTCTTTCCGCAGCCACGCGGCCCAGAGAACA10.15923566878980894No Hit
ACTATAACCACCCGATAAGAAGGTG10.15923566878980894No Hit
GTTGGTGTTGCGCAAAGCTTCACCAGCGTGCCGATGCTGTTCGCCACACCTACCTACTAT10.15923566878980894No Hit
GGTCGACCGGTCGGTCGCATTCGATGACGCCGATCGCGCCGCACACCCGCACATCGGTGA10.15923566878980894No Hit
GGGCCTGGACCCGGCGCATGCCCGGCTGGTCGTGCACGCGTGGCGG10.15923566878980894No Hit
GTAGGCGGTGACGGAAATCGCCCCGGCACCGACGCCCTGCAGTGCGCGAGAGATCGCCAG10.15923566878980894No Hit
ACGCTCCGCCGCCCACGGCCAACAGGATCGGCCAATCGACGAGTCCAGCGACACCGATCG10.15923566878980894No Hit
GGGGACGTTCTGTTCCTTGGCGGTGCTCAGACGCCAGGACTTGAGCTGCAAC10.15923566878980894No Hit
CGGATCAACCTAGCGACGCCAGCGAGCGAACC10.15923566878980894No Hit
AGTTCTCATGACCATTTTCCGCCGCCACCTTCCAATTAGCTCGCCACTCATGCGACCACG10.15923566878980894No Hit
TGATCCATTACCTTGCGCCACAACGGCGGGAAGT10.15923566878980894No Hit
CGTGCCCGACGGGGGCGTGCCCGTCTACATCGCCGCCGG10.15923566878980894No Hit
CCCACCCGCCGGTGTACGCCATCGAGGCGGGACAGCACCGCGGCG10.15923566878980894No Hit
GAAGGCGATGCCCGCGGGCTT10.15923566878980894No Hit
GTGCTTCTGCGATCGCCAAGGCGGCGGCGTGCGCGAACCGG10.15923566878980894No Hit
CGTCGCCCACGCTGCCGTTGCTGTTGCCACTGAAGCAGA10.15923566878980894No Hit
ATTGTCAATGCGCCGACAA10.15923566878980894No Hit
CGACACCGCGGGATTGCGGCGCAAGGTCGGC10.15923566878980894No Hit
GTCAAGTGTCTCAGCGGGGA10.15923566878980894No Hit
GAAGCCCCGCGCATCGGTGTAGACGACGTTCACCAGGTCGAACAGCTGCTTGG10.15923566878980894No Hit
CGGCTACGAACCTGGATAGGCAA10.15923566878980894No Hit
GGTCGACACTTCGGTATGGATTGAGCACCTGCGCGCCGCCGACGCGCGACTCGTCGAGCT10.15923566878980894No Hit
TCGGCAAAATCGGTAAGAGCCTGAAGAATTCGGTATCGCCGGACGAAATCTGCGACGCAT10.15923566878980894No Hit
CGCGTCGCGTTGTGCGGCAAACAACGTGTCGCAGGCGCC10.15923566878980894No Hit
ACCGCCGATGGCGCCCGGTAGCTGCGATCCTCGGCAGT10.15923566878980894No Hit
TGGGGCAGCGCCGATCAGCAGGCCACCTATCTCAAAGAGTTCGCCGGCGAGAACGTTCCG10.15923566878980894No Hit
TCGGCAACGCAAAGTTGCGATTCCGT10.15923566878980894No Hit
ACTCGTTCAGCCACAGATCGTCCCACCAGGTCATGGTGACCAGGTCGCCGAACCACATGT10.15923566878980894No Hit
CTGCGGGTAGGCCACTGCGACGACTCAAACACGGTGTCACACGGTGAATAGTGTCGAGAT10.15923566878980894No Hit
AGGGCGATCAACGAAGGACCTGCCCC10.15923566878980894No Hit
CAGATGGGTGAAGCGGTAGCGGTCCG10.15923566878980894No Hit
CCGGCGTCCAGTGGGCATTAGTCGCGAGCCCCGCAGTAGTCGAACAGCTAGGTGGCCGCT10.15923566878980894No Hit
ACGACGGTTTGTCCAGGCTGCGCTGGAGCAGTCGGGCCTGTTGGCCGCGCCACGG10.15923566878980894No Hit
TGGCCTGCGCGCTGGTCGGGCGTCCCCAACTGGTGTTCCTCGCCGAGCCCACCGCGGGCA10.15923566878980894No Hit
CAAGCTTCGATCGACAGTACTCCCGCCTTGGGTCTGGTCTTCGAGCTGGTCGGTCATGGT10.15923566878980894No Hit
TCGGCGGGCGGCACCACGCCGGCCTT10.15923566878980894No Hit
GGGGCCGGTGCAAACCGCGTTCGACGAAACCGGCTCCACCCGGATAGCCGTCATACACAA10.15923566878980894No Hit
AGATCGCCGTCGGCGAAGTTGCACCGCCGCTGCACGGCGGCCACCAGCTGCTCGTTGT10.15923566878980894No Hit
GTGCTCAACATCGGCGACGCGATCCTCTTTTACACCGACGGCCTGATCGAGCGGCCCGG10.15923566878980894No Hit
CGGGCATCTATGGCATGAACTTTCACTTCATGCCCGAGCTGGACTCCAGGTGGGGTTACC10.15923566878980894No Hit
GATGGTCACGTCAACGCCGTAGTTCTTCAGCACGTAGCCGAACTCCATGCCAATGGCACC10.15923566878980894No Hit
CCGAACAACCACCCGCCGTT10.15923566878980894No Hit
TGCTGTTCCCGTTCCTGATGACCG10.15923566878980894No Hit
GGCGCATTTTGGCGTGAGTGCCGAGGCGTTCCTGCGGCG10.15923566878980894No Hit
GGCGGTCGCTGTGGCGCTAC10.15923566878980894No Hit
CAGCGACCGCGCCGTCACGACCACCTTCGCGACCGAGCCGAGCGGTCCGCCCGCAAGTTG10.15923566878980894No Hit
ACCGCCACGCCGCCGTCGTCTTCGTCGACATCGTCGGCTCCACCC10.15923566878980894No Hit
AGATAAACCGCAGGTGAGTTGTTCCTACCGCTCTGGAACTGAACCTTGATGTCGCGGCCC10.15923566878980894No Hit
GCCGCCCTGACCGGTTGTTGTTGC10.15923566878980894No Hit
CCCGGTGTGCCTTGTATGACGGGATCAGGTCCACCCGGAACTGCGGGCGCCAATCCAAGT10.15923566878980894No Hit
TGCGCCGCACCGGATTCCACCACCACTCCACGACGGACGACGACGACAGCCCC10.15923566878980894No Hit
CCGCGGTCACAGGCGCGCACGCACCGCTGACCCGTACGTTCATTCGGTTCTGGACAGTGT10.15923566878980894No Hit
GGGGGTGCGAAGTTCGTGGCCCGCATCGCTGACGAATTGACGTTCTCGCTCGAGCGCGTC10.15923566878980894No Hit
TGTCGGCAACGCGATGTTCGGCTTGGCCAGCCGCTACA10.15923566878980894No Hit
CCACGCTACCGTGCCGTGCAACGGTTGTCGAAAACTATTTTCAACAGGCTGGTCCGGGAG10.15923566878980894No Hit
CTGGTCAACGACCAGTTCGCGGAACCGCTGGTCCGCGCGGTCGGCGTCGACTTCTTCGTA10.15923566878980894No Hit
GTCCAGCCGGGCTTGTGCTTCCTCGTCG10.15923566878980894No Hit
TGATGCAGATCAATGGGTTACAACTCAGTCCACCTGCGAAAGGACACAACTGCGGTTAAC10.15923566878980894No Hit
TCCCGAAGCCGATTGGTTGCCCGGTGGTCTCGCTGACCA10.15923566878980894No Hit
CTCCATGCTGGTGTCGCTGTTGACCCGGGCCACCCCGGAAGAGGTCAGGATGATCCTGAT10.15923566878980894No Hit
CCAGTGAAGCCAGCCCGGCGCGCACCGCGAATCGTGAGGTCGCCGGCAGTC10.15923566878980894No Hit
GTAACTCCCGGCTGGCCGCAC10.15923566878980894No Hit
CGTATTCGCCGCCGACGGCGACGGTGAC10.15923566878980894No Hit
CACCCGGCGTCTCGCAGCAACGGCGGCGTCGCGGTTGG10.15923566878980894No Hit
AGCGTCGACAGCGCGTAGGCACCGTACTCGGCCCCACGGTGGTAGCGGTCGGAGACCAAG10.15923566878980894No Hit
AACGGAACAATTGTCTGGCTCACCGACAGATTCCATCCATTGAATGCGATGCTGCGCCGA10.15923566878980894No Hit
GGCGATGGCGCCGCGGATCACCGCGATAACCGTGCGCGACGGAAAGGTGTGCGCGCTGTG10.15923566878980894No Hit
ATCGGACCGGACGGCTCGAC10.15923566878980894No Hit
CCCGGGTCTGGTCGTCCCACAAGTGCGGATCAGATGCCTCATGCTCGAGCTTCTCGATGC10.15923566878980894No Hit
CCGGCGAATGATGGCTCTGATGAATGAGATTGATCTGTATGAGCACAAGACACCGCTGCC10.15923566878980894No Hit
TCCTGATCGGTGACGGCGTCAGCCCCG10.15923566878980894No Hit
GCGCAGGCGGTGGCGATCCAGAAGTGCAGCCACGAATACATCCCGTGCGATCCGTCGGGT10.15923566878980894No Hit
CGGGACTACTTCGATGTCTACGACATGTTCAAGGGCCTCTTGCGAGGCCTGGTGGCGCTG10.15923566878980894No Hit
AAGACTGGGCGCGCACCCATCGCGACCTCATTGCCGGAGAA10.15923566878980894No Hit
GACAGCGATGTTGGGAACGCACGGGAAAGGGCT10.15923566878980894No Hit
CAGGCCGTCGTCGCCGACATGC10.15923566878980894No Hit
GGGCTTCGGTGGCCGACGGACTGTCCCGGATAGGCCATGAAGTCGCGATCATCGACCGTG10.15923566878980894No Hit
TGGTCGCGGTCCCCGTG10.15923566878980894No Hit
ACCATACGGATAGGGGATCTCAGTACACATCGATCCGGTTCAGCGAGAGGCT10.15923566878980894No Hit
GGATGTCGGCGAAGAAAGCCAGGCGCTGTTGGGATCGAAGCCGGGCCAGCATGCGTTGGG10.15923566878980894No Hit
CCTAGCCATGGTCGGCCAAGTCTTCACCGACGGCGACA10.15923566878980894No Hit
CATCTGGTTCACGACTGGAACCGGGAATTCGTCCGGACCTCGCCGGCCGGCGCGCGCTAT10.15923566878980894No Hit
GAAGGTGGCGGCGAATCTCCGCATGCCC10.15923566878980894No Hit
TGGCTCGGTCACTGCCCCTCTGGAG10.15923566878980894No Hit
GGGGTGAACGCGGACACCGCGGCGATCTGCGACAAGACCGGCGGGCAG10.15923566878980894No Hit
GCCTGACAAGACAACGCCACCCCGCCGACCTGCTCGTGG10.15923566878980894No Hit
GTCGGCGACGCTGGTGTCGGGGGGGGCCAGCTTCTCG10.15923566878980894No Hit
ACCGGTGGCACCGCGGGCTCCGGCGGTGCCGGAGGGTTCGGCGGCAACGG10.15923566878980894No Hit
TAGCACCTTTTCGGACCTGTTCGGCCTCCACGATGTGCACGATGCGCGCCAGCGAGTTGT10.15923566878980894No Hit
CGCGGACTGACCTTCTACTACTTGGACCTCGCGCTGATGTCGGTCACCCAGCCCGACCGG10.15923566878980894No Hit
AACGAGGGATTCCGTCATTATCAGCCAAAATAACTGCTCTCGGGTTACACCCAAACAGCG10.15923566878980894No Hit
TGTCGTTTCCCTACGAGCCGTGGACCCGCGAAGAAGGCATC10.15923566878980894No Hit
CATCGGTGCTATGGTTGCGGTCAAGGACCGGCTCAACGGCAGCCGCAA10.15923566878980894No Hit
CGACGGCGGTCGAAATACGCACAGGCACACGAGGAAATACAGGTACCCCAGCGCCCAGCC10.15923566878980894No Hit
AGCACCTTCACGCCCAGACAGCCAACCGGGGCTGCGCCGAACCACGTTCGCGCGCTGGAT10.15923566878980894No Hit
GATATGACGAGGCGGCCGGGATCCCGTTGCGCGGACTGGCCC10.15923566878980894No Hit
CGACGCCGCCGGCAGCACGTAGTGGGCGAGCCTGGCC10.15923566878980894No Hit
GCACCTCGATGATGTCGCCGGGACGCAGCTCCTCGGCGCGG10.15923566878980894No Hit
GTAGAACGGCTCGCTCCGC10.15923566878980894No Hit
CGTCGCGTTGCGCCGGATCCCGGATCAACCACACTCCGGG10.15923566878980894No Hit
CATGATCCCGGCAGTCAGCCTGGG10.15923566878980894No Hit
TGTGTCGACGCCGC10.15923566878980894No Hit
GTTACTCCGTTTACCCGATATGCAGCGGGTCAATCT10.15923566878980894No Hit
TTCTTCATCGGCGTGCTCGCTGGGGAGGAGCCCATCGACCACAC10.15923566878980894No Hit
GCACCGGTTTCGTGCATTGGAGTCG10.15923566878980894No Hit
CGACCGGGGATGGACGTTGTATGCACACCGGCCCGCTCGCGGAATCGGCATCAGCGAATG10.15923566878980894No Hit
CGCCAGACCCTGACTGATCCGCAACGCCGCCGCCACCTCGGCGGCCACCGCCGCCATGGT10.15923566878980894No Hit
ATCGGGGATAAAAACCGTAGACTCCGCCGAGTTTGAGCCCCGTGTATTTGTGCTGACGAT10.15923566878980894No Hit
GCTAAACCGTTCGAGATGGACCCACTCGGCGGACCCACACACGTGTTGGCCACCGCCGAC10.15923566878980894No Hit
CGCTGCCAACACGAGCAGGGATCCGGCCACCCACAGTGCGAAGCGTTCGTCCTCGCCAAT10.15923566878980894No Hit
CTGCCCCCCGAACCGGTGGTCCTCCCGGG10.15923566878980894No Hit
AGGCGTCCTCGGGGTCGTCGTCGCGCATCGCGATCTCCTCGAGGACGACGTCACGTTCCA10.15923566878980894No Hit
CATCGAGGGTGTGCTTGGCCGCGAC10.15923566878980894No Hit
GACGGTCCTGCTCATACTCGACGGTCTCTTCCTGCACGAACACCGGCCACATGCCTAGTT10.15923566878980894No Hit
CGCCCCCGGGTTTGACATCGACCTGATTCACGAGGTGCTCCAT10.15923566878980894No Hit
CGTGGCTATTGGTACCGCTCATCCCGAACGCGGACACCGCCGCGGTGCGCCATCCGTCA10.15923566878980894No Hit
GAAGAACGTCAGCTCGCCGATGCCGGCCGC10.15923566878980894No Hit
CGGGAAGACCCGCGGGTTTCCGCTGGGCCCCGGG10.15923566878980894No Hit
GACGCCGCGGCTTGGCGATAGTGTGGGGTCACCGGCGATGAAACTGGCGGTG10.15923566878980894No Hit
TACGTGAATTGCGCGGGCCATGGCAAC10.15923566878980894No Hit
AGCCCGACACGAAGCGAGAACTCAATT10.15923566878980894No Hit
CGTCGACCAGGTACCGGATTCGCCGGCGATGACCGGTCCCTACATCGGACAAAGGCCGTG10.15923566878980894No Hit
TCCCAGTTGGGTGGCCGCCATGTTGCGCAGCACCTTGGCGCGGCCGCGCGCG10.15923566878980894No Hit
GTGCACTCGAGTAGGATGGTCGACAGCACTTCGCCACACCTCAGTGCGTCGGCTCGCGTG10.15923566878980894No Hit
AGGCCAGCAGTCAGGACGTACTGGACGGCGCCATCAATGCCGACGAGCCAGGTTGTTCGG10.15923566878980894No Hit
TCACCACCCCAGCTATCACCACCCCGGAGTTCGCGATCCCT10.15923566878980894No Hit
CTTGGTAGAGCACCCGCTTGTTGGCGAAGTAGTGGTTGATCGCCGG10.15923566878980894No Hit
TAGCGTTCGCGCCGGTGGCGCC10.15923566878980894No Hit
CGGTGTACTTCCTAGTCGTGCTGCCCTACAACACACTACGCAAGAAGGGGGAGGTCGAGC10.15923566878980894No Hit
GGGGCCCAGCCGTTCGGGCGACCTG10.15923566878980894No Hit
CCGTTGGAGGGCGGCGACATTGGCGACGACGCCGTGGTGTGGGAG10.15923566878980894No Hit
GCAGCATCGCCCGCAATTCCCCCACC10.15923566878980894No Hit
CCCAGCAGCCGCTCATGTTCGTGCT10.15923566878980894No Hit
GATCGCGGTCGCCCCGTCGACACTGTGGTCTTTGACCGTGGATCCGGACCGGCCGGGTTC10.15923566878980894No Hit
GACCTGCTGCGCTTCGTGCTCGAAACGGGTACGCCCAAATCCGACCGCACC10.15923566878980894No Hit
CGGCGTCGGATGTCACCCGCAGCAGGCTTTGCGCTCTTGCCAGCCGCTGCTGTTCCTCGC10.15923566878980894No Hit
GGTTTGTGCAGGCCTTGAGTACCGGCGCGGGCGC10.15923566878980894No Hit
GTGGGGGAAGCCGACTTGTCCGCGTCCGTCGACGACGTGGTTTACGGAAAGCACGAATGA10.15923566878980894No Hit
GCCGAGTCCGATCCGAATTCCGGTTCGGTGATGGCCATCGCTGCCCACACTTTGCCCAGC10.15923566878980894No Hit
AAGATAATCGTGAAGGAATTCGCGCTGTCCGGTCAGGAGCAGAAACCTGACTTGACGGG10.15923566878980894No Hit
CAGCGTCTGCAACGCCGCCGAGACGCTGCTGGTC10.15923566878980894No Hit
GTCCCCGATATGCCCTGCGAGGTTGCCTCGTGGCTGATGACTCAAACGACACCGCGACCG10.15923566878980894No Hit
GGGGTGCTGCCAGAAAGCGATCCCGCGGTGCTGGCCGACCTGGTCTCGCTCGTACACTCG10.15923566878980894No Hit
TACGCCACCGCCGAATGCGACAGCCTG10.15923566878980894No Hit
CGGCGGTGGCAGCGCTGTTTGCCAGGTTCGGTCAGGAATATCAAGCGG10.15923566878980894No Hit
GGCTTTTCCACGGCAACGCGTGCG10.15923566878980894No Hit
GGTGGTCACAGCACGGCAATTGTTGTGGG10.15923566878980894No Hit
ACCCGCTTCCACACCGCCCCGTTACGGCGCGTGACGTTGCCGTCTCCATTCACATAGGCG10.15923566878980894No Hit
GAATAACGCCGATGATCTTCTTGAACTCATCAACCGCCTCTGTGGGGATCACGAACTGAT10.15923566878980894No Hit
GGTCACGACGACGCTGGCAATCCGGTGGATAGTTGGGGAACCGGCCTCGTCGTCGACCAA10.15923566878980894No Hit
CGCACCTCAAGTGTTTCGGACGCCAGCCGATCGCCGGCGACCACCTGATGGGTGGTCGGG10.15923566878980894No Hit
CCGCTGCCGCGGTCAACCACCCCGCGTAGGGAG10.15923566878980894No Hit
GAACGGTCTACGAAACGCGCCCAG10.15923566878980894No Hit
GGGCGCCGATCTGCACCGCGGCGAG10.15923566878980894No Hit
TCGGCGAGGCGAAGCTGGCGGCAAACCAGATCGCCGACCCCGAT10.15923566878980894No Hit
ATGTCGGCGATCACCTCGTCGATGCCGACGCGA10.15923566878980894No Hit
TGGCGGCAAGGGCGGCGCCGGTGTCGGCCCTGGCTCCACCGGCG10.15923566878980894No Hit
TTCGAGGAACGCCGGGCGTGGAATAAGCGCCTG10.15923566878980894No Hit
TGCCCGAACAACGCGGTAGTGCCATTTTGCACGCAATAGCCGATGAGTTGTTTGGCGAGG10.15923566878980894No Hit
ACATACCGATCGACCAGCTCCCGTCGCTTCCGCCGCTGCCGACTGACCTGCGAACACGAC10.15923566878980894No Hit
ACCAAGATCGCGTGGGGTAGCGCCATCCGGGCAGCATAGACGGCAAGCCGGATTGCTATG10.15923566878980894No Hit
TTCCGGCGGCCAGCAGCACCGCAGCCGCCGAGGCGGCCTGGCCCAG10.15923566878980894No Hit
TGATAGCGACTTATCAGAAAGACGGCGTAATCGGTTCCCGCCCCGATCATGACCGCGCTC10.15923566878980894No Hit
CGGCTGGTCTCTGGCGTTGAGCGTAGTAGGCAGCCTCGAGTTCGACCGGCGGG10.15923566878980894No Hit
AGTCCTCGGGCGTGAACGGCTCGGGGAAGTCGAA10.15923566878980894No Hit
CGGTGTACCAAATGCCATTGCGCCAAGCGGTTTCGTCCTCCTGCGACCCGCTCTGTAGTT10.15923566878980894No Hit
GGTCCAGTTGCGCCCCCCGAACTTTGCGGTGGCAAATGTGTAAGGGAAGCGCAGGCATGC10.15923566878980894No Hit
CGCCGAGTACATTACGGTCTCGGCTGGAATCTGTTT10.15923566878980894No Hit
CCGGACTCGCCCCAGATGAACAGGGGGTTGTAAGCGCGGGCGGGTGCTTCTGCGATCGCC10.15923566878980894No Hit
CAAGCGGAACCGAGCTTTACGCTGCGGATGAAAACCGAGGTGACCGGGTTGCTGCGGGAG10.15923566878980894No Hit
GGCATGACCATCGCCGTGGCGGCCACGTTGGTCATGATCCGACA10.15923566878980894No Hit
CGCAGACGGCGCCGACAAGTCTGGACTGACGGCATTGACCTGGCCGGTAATGGCGAACTT10.15923566878980894No Hit
CAGGTCGGTGCCGGCGCCGGGCTCGGAGCAGCCG10.15923566878980894No Hit
GTTTGCTGCGCCGGCAAGATGCCGTCGAAACGTTCACCGCTGCGATCGCGCT10.15923566878980894No Hit
GACAGCGCGACCTCGACACCGTCGAAGGCCTGTATCGACACCGCGGCCCCCCCAGTCTGT10.15923566878980894No Hit
AGGCCACGGCCGCAGCAGCGGCTGGGCAATCTGGCTGGG10.15923566878980894No Hit
GGGGCGGCTCAAGGCAAATGGGACCCGGGAAACCTGTCGATGGACTGGTTTCGCGCCGAG10.15923566878980894No Hit
CGCCGCCATACCCGCGCGGAGACACCGTTTTTCACCGGTCCG10.15923566878980894No Hit
GACCGCTACCAGCTGGGCGACCACTTTTTCTTTCGGCACCTTGTCAATCTTGGTGACGAT10.15923566878980894No Hit
AAACAGCACGCCCGAAGAATGGGATGGCAC10.15923566878980894No Hit
GGAGCACCTGGCCACGCTGGGGCTCGACCTCGGCAACAAGAGCGTGCTGGAGGTTGGTGC10.15923566878980894No Hit
GCCATGTGACGGTAGCGGCAGCTCA10.15923566878980894No Hit
GATAACTTGACCAACTCCGCGGTCTGCAGATCGGTCACCAGAAACGGCACGCC10.15923566878980894No Hit
CGTCCGCGCGGCGCTGCTCACGGCTGCGCGGATCGTGCTCACACACG10.15923566878980894No Hit
ATCTGGGCAACCGTGGCGGGTTTGGTTGGC10.15923566878980894No Hit
GGGAATCACCCGTTCGCGG10.15923566878980894No Hit
AAAAGATTTCGTTTCTGCCCGCCATGCGCTACTACGCCGGCAACTGGGCCACCAGCATCT10.15923566878980894No Hit
CGGCTATCTCGACGACGACGGCTACTTGTTCCTGACCGGCCGGCGCCC10.15923566878980894No Hit
GATGACGGTGCCGAAGATGATGTTGGCATCGGGGTGAGCGGCGTCTTGTA10.15923566878980894No Hit
CCTCGGATCCGCCACCCAGCGCCCACCCGGTCAACGCGACGGATCCGACCGCAATACCCT10.15923566878980894No Hit
GGATGGGCCGCTACGGCGAGCGCGCGGAGAACA10.15923566878980894No Hit
TGTCGGACGGCGCGGTGGTACGGGCATTGGTATTGGAGGCGCCGCGCAGG10.15923566878980894No Hit
GCCCACCTCGTGGTCGCACGCAATGGCGGTGGCCGCCCAGGG10.15923566878980894No Hit
GGTGAGCAGTGCGTCGGTGGCTCCGGG10.15923566878980894No Hit
CGTGGTTGCGCCCATACACAATCAAGAGGTCCTTGATGCGGCCGATGATGAACAGCTCGC10.15923566878980894No Hit
CCGGGCGCAACGCCTGGCTCCCG10.15923566878980894No Hit
ACGACCCACTTCCCCGCCGAAT10.15923566878980894No Hit
CGCGGCGGCCATCAGGTCGACGGGT10.15923566878980894No Hit
CGGCGCGGTGAAGCGCGCAAAGGTCCAGCAGATCACTCCGCACGATTTGCGGCACACCGC10.15923566878980894No Hit
ACCATAGGCTCGATACTGCC10.15923566878980894No Hit
GACCCGACCAAACACTTGTCCGTGGTCGTCTCGATCGGGTCCGCCGCCTGGG10.15923566878980894No Hit
TCGGCACCCAGCTTCTTCAGCAAACCATCTTAGATGCGCAGGCAGGTGAGC10.15923566878980894No Hit
GCCGACGTGACGGTCGGCCACGTCACGAAAATCGAGCGCCAAGGCTGGCACGCGTTGGTG10.15923566878980894No Hit
GATATCACCAACAACGCCCG10.15923566878980894No Hit
AGTTAACTAGCACTCGACCGCTGAGGTAGCGATGGATCAACAGAGTACCCGCACCGACAT10.15923566878980894No Hit
GCCGGCGACACCGCAGGTGCCGTGGCTACC10.15923566878980894No Hit
ATGTTGGTCGTGGCGAGGGCGAGGCGCGTCAG10.15923566878980894No Hit
AACATCCTCAAATCGCGGAT10.15923566878980894No Hit
GTATTCGTGCTCCGACGGCAGCCGCAACGCCTCTTCGTAGTCGAGGG10.15923566878980894No Hit
TGGTTCGCGTCGGTCTGGCCCCGGAGATGGAAGAGCCGCCACCG10.15923566878980894No Hit
GATAGCGTAGATGTGGCCCACGTTGGTACGCATGTAGTCGTCGACACCGAT10.15923566878980894No Hit
CACTACGCGTCGAAGACGTCCTCGCCGCC10.15923566878980894No Hit
CCTCACACTTGCCGACTTCGGCAATACCCGCAG10.15923566878980894No Hit
AGGCGACTGGGCGCATACCTATTCGGGTGGCGGCAACCATGTCGGAGCCGGATGGATGGC10.15923566878980894No Hit
CGCGCCGAGTCCAATAAACACCATGGGAAGAATCAGCAGCGTGGTGGCGATCGAGCGATA10.15923566878980894No Hit
CGATACGGCGACGCGGGTCCACGATCATC10.15923566878980894No Hit
GAGGTGCTGATCGTCGCCGACGCTGTCGCCGCGGCGGCCATCGGTC10.15923566878980894No Hit
TGAACATCGACACCGTCGTCTACTTCCAGGTGACCGTTCCGCCGGCGGCGGTGTACGAGA10.15923566878980894No Hit
TCCACGACGCCGCAGCCGCCGTGTCGGGCC10.15923566878980894No Hit
AACCGTTCGGCCAGCAGCGGAACGACACCG10.15923566878980894No Hit
GATTGGCTTGACCCGATGATCGCGTTGAGTTT10.15923566878980894No Hit
CGACCTTCTTTGACCAGGCCGGGTCGTCAACGGTCTTGCCGGCTGGGGCTTGGAAGATCG10.15923566878980894No Hit
TTTGGGTGGTCGTTGGTTGTCGGTGG10.15923566878980894No Hit
CCCGGGTTTCCCACGGATCTGCAGCCGATGGCTATCGCTTTGGCGTCGATCGCCGACGGC10.15923566878980894No Hit
CGTTTCGTAGACGTTCCACAGCGGGCAGGTGCCGCCGCTGGAGG10.15923566878980894No Hit
TTCCGGCGGGCTCGAACACTTCACCGGAAAAGCCGGCAATAAACGCTGGAATTCGATTGC10.15923566878980894No Hit
TGCCATTCCAGGTGCGGCCGGCTGGGCACGTAGGCGGTGAAC10.15923566878980894No Hit
AAGTAGACTCGAGGCACGCAAGTTT10.15923566878980894No Hit
GACCTCACCGGCGAGGTAGTGGACGAGGTGATCAGTCGGGGCAAGCACCTGTTCATCCGA10.15923566878980894No Hit
CCAGCGCTCAACCGGATGCTCCTTGGTGTAGCCGTGGCCGCCCAGCAGTTGCACCCCGTC10.15923566878980894No Hit
TGGCGGCAGTGTCGAAGTTCTTCAACGGCAATCCCATCCTGGGCACGGTTTCCGGCGGG10.15923566878980894No Hit
CGGTGACGGCGTACTTCGCGTCGCGACCCGCCGCCGCCC10.15923566878980894No Hit
GGATCCACGTCGACGACCACAA10.15923566878980894No Hit
GTGCGCCGAAGCTCGTCGAAGTGAACCGCCCCGGTGAGTCCGGAGACTCTCTGATCTGAG10.15923566878980894No Hit
CGACACCACAGTGCGACCATCGACGG10.15923566878980894No Hit
CGCCAATGGTTGCCAGCTGCCGAGGTGATCGGCGAACTGGTGCGCTTGGGC10.15923566878980894No Hit
GGACAGAACCCTGGGAAAGTCTTGGTGGAACGCATTCTGAAAAACGTCCG10.15923566878980894No Hit
GCAGCCGACGAGCTGGCCGATCGGTATTCACTCGTCTCCGGGCAGCCGCTAGGCCACTGG10.15923566878980894No Hit
CTGGCGCGCTACCCCGACGCCGAGACCGTGCAGTTGCATCCCGCGGATGTGCCCTTTTTC10.15923566878980894No Hit
GCTGATCATGATGTTCGCCGGCTTGACGTCACGGTGGATG10.15923566878980894No Hit
AAGATGATCGCGTCGATCGTCATCAAGGCC10.15923566878980894No Hit
TTGCTGATTACTCGCTGTGACCCATGAGCGCCGCGAACCGCGGCTTGATCACTTCGTCGA10.15923566878980894No Hit
CTGATGCGCTCGAAGATCTGGTGCCGTTCGCCGAGCTCGAGAACCGCATCGTGATCGCCT10.15923566878980894No Hit
TTGCCGGCTCCCGGGCGGCATCGAT10.15923566878980894No Hit
ATCAAGGACCTAGCCGCCCTCGCGACCGCTCTGCAGGAATTGACGACTCG10.15923566878980894No Hit
CTTTGACCGGCGCCGATAGTGCTCTGCAGGAACGCCTTCACAGCCGTGCCAACCTAC10.15923566878980894No Hit
ACCATCCCAAGCCACCGATGCCGCAAGCATCGGCGCAGACCCCGCACCGGTAAACATCCG10.15923566878980894No Hit
CCGGATCGATGAGGTGTCCGGCGAGGCCGAACTGGAGGCCGGTGTCACCGGGCCGGAAG10.15923566878980894No Hit
GATTTCGTTCATCCCCGGGTTGTAGTAGGC10.15923566878980894No Hit
CTGGCCAACACGATGCTCGAT10.15923566878980894No Hit
GGACGGCTGGGCCTGCTCCCCGAAATCGACACGCGC10.15923566878980894No Hit
GGCTTGCTCGGGCGTGAAGTACTTGCCGAAGAA10.15923566878980894No Hit
GCCCGATTAGCTTCCCGCAGCGTACAATTCGGCTGACTGTTGAATGGGCTGGCGGCCGC10.15923566878980894No Hit
GACGGACAACCCGAGAAGGCGTTCCTGGCGGTGCGCACGCCCGACGGGTC10.15923566878980894No Hit
CGACCAGCAGTACGCCGGGCAGACCTGCGGCAACCGGAACGGTCGCCGCGGGCACA10.15923566878980894No Hit
TCCCGGGACTGGCCGGCCCCGAAAAACCGGTCCGCTCGGAAGGGAAACATGTTTTCAACG10.15923566878980894No Hit
TATATTGTCGGCGACAAGTCATGGTCGATCGACATGCAGCGCTACAACTTTCAGTTCACC10.15923566878980894No Hit
GATCTGGCAGCCCGCTGCAGTTTCGAGCAGGTCGCCTTCCTGCTGTGGCGTGGTGAGTTG10.15923566878980894No Hit
GGGTTCCCGGGCGCAGTTCGGCCACCACCGAATAGCCGTACAACGCTTACATCGGGTGCT10.15923566878980894No Hit
CGGACGCGCCGACGATTTGTACTGCTTGGT10.15923566878980894No Hit
ACTCGAACGACAACTAGAGCGCCGCGCCAAGCAAGCCAAACGCCGTCGCATCTTGACTAT10.15923566878980894No Hit
GGGCAACCTCCCGGGACGACCGCAGGTCGGCAACGTCGGTGATCCCCAGCCGGCGCAGCG10.15923566878980894No Hit
CACCGCGACCACCGCCGCGACCAACGCGACGGTGCGCCAGCCGCGTTTGGCGGCCGCC10.15923566878980894No Hit
CGAAAATGCCCGAAACGCGCCCAGCAGGCCGGTCAGCCTCTTGCGCAACTCGTAGCCATG10.15923566878980894No Hit
CGCCCCCATTCTCCGCGCGGGTG10.15923566878980894No Hit
AACCAAGCACACCACGATGCGGGTAGCCAGCATGGTGGTGAACACTGAGCGGTAGCCAAG10.15923566878980894No Hit
AGCGCTCCGCTTAGCCGAACTATACGGAACGCTCGCCCCGCTTTGTCAAGGCTTTCGGAG10.15923566878980894No Hit
GTTCGACCCTGATCGGATGGTTGGCGCCACTTTGAGCCCGACCACCTGGTTGCCGTTCGG10.15923566878980894No Hit
GCAGAATCGCAGCAGCCCGTCGCGGCTATCGCCTATCCCTTCCAGCGCTTCTAGGAGTTT10.15923566878980894No Hit
CCTTGGGATGTGGATACTCCTCCTGTACGCGGTAGAGAATCGCATTCATCGCTT10.15923566878980894No Hit
CTGACCGTGACGTGTTCCCGCGACCTGGTGCTGTCCGCGCCGGTGAAGCTGGCCGAAGGC10.15923566878980894No Hit
GGCCAACGCGGCCGCGGCGGCCAACACCACAGCACTGCTGGCCGCGGGTGCCGATGAA10.15923566878980894No Hit
CGACCTCGAGCTGGCATCCGCGGCCGACGAAGCCACGCTGCGCGCGGCGATCAGCCGAAC10.15923566878980894No Hit
CATCCGGGTATTTCGAGCAGACGATCTCATAGGTCGCCAGCACGATCGGGTAAGAGCCAG10.15923566878980894No Hit
AACCCGGTCGGGTGACGGTGCTGACCGGCCGAAACGGCGCCGGCAAGAGCACTACGCTGC10.15923566878980894No Hit
GGCGACACGCCAACCTGGCTGAGCGTGCAAGAGGACGACGGGCGTCTCGAGCT10.15923566878980894No Hit
CGGACCCGATCGACGGCCAGCCCAGCGCAACCCGCGGTCAACACCGCCCAC10.15923566878980894No Hit
GGAGGTTCTCCCCTTCATCGTCAATCATGTTGTGTTGATGAAGAGTTCACGTATTCCAGG10.15923566878980894No Hit
GGCTCGTTCGCAGGGCTGG10.15923566878980894No Hit
TGACCACCTCGCCGTAGCGCCAGCTCTTGAAGACGTTCCACACGAACGGGAACATCGACG10.15923566878980894No Hit
TTGAGCATCAGCCGGTAGTAGGAGGTGTTCAG10.15923566878980894No Hit
AGAGCACGCTGATCCCCCCCGGTCGCG10.15923566878980894No Hit
CAGATCGCCGCCAACCAAAAGGGCGTCGACGAGGTCGGAAAAATGATCGACCACTTCGGC10.15923566878980894No Hit
CGACCCAGCGACCTGGCACGCGCTATCGCGTTCGTCGCAGAAAAGCCGCGCGGGTGTGTC10.15923566878980894No Hit
CGCACCGGAGACACGGTCGGATGCCTGTCCACCAGGTGGTCCACCGAACCGCAAGCTTCG10.15923566878980894No Hit
GAGCGTTGGTCTGCCAGCAGCGCCAGATGACGTGGAGCCCGGCGG10.15923566878980894No Hit
TCGAGGATACCTACGCCGCGCAGTACTACACCGACCGCTCCCG10.15923566878980894No Hit
TAGCCCGCTGCTAGCGCGTAGTTCGGCGAG10.15923566878980894No Hit
CGCGCCCGTGCTATCAGCGCCATCATCAGATGCCATCCGGTGCTCGCAAGCACAGCTTCG10.15923566878980894No Hit
CCAACTTCGGCGCCCCGGCATCGTGGCCGCCGTACTCCACCGCATCGGTGACCCAACCAT10.15923566878980894No Hit
GTCGCGGGACCTGGACGAGGC10.15923566878980894No Hit
CAAATCGACCCCAACACGGGGTGAACGCTACATTAACCAATCGTGAAGTCCTATACCTCG10.15923566878980894No Hit
GGCCCCAAACCAGCGGTCGTTCGCCGCCGTAGGTGTTCGGCTGACCCCCGGCAC10.15923566878980894No Hit
GGCGGTCACCGTGGCGCTGTCGATGTCGGCGACTGCGCTGTTCCCGATGTACTTTCTGAA10.15923566878980894No Hit
CGAGGCCATGTTGCGCCGCCAGATCGCTGACCTGGAGGAACAGCAGGTTAAGCTCGC10.15923566878980894No Hit
ATGCTGCGTTTTGGTCCGCGGAAACCGCAGGCTGGCATATGCAC10.15923566878980894No Hit
AACGCGTTGCGCCATCAAGA10.15923566878980894No Hit
TGCCCGAGGCACTAGGTCGCAAGGGTAACCGAGGGTGCACGTTGACGGGGTGAGGCCAAG10.15923566878980894No Hit
CAGCGCCTGACCGGTCTCAATCCCTCCCGGCCCTACCGACAACACCCCCTCGGGCAGGCC10.15923566878980894No Hit
TCGGTGCGCAAAGCCGCGGCCGTGGGCAAGGTCGGCGGGCTGGCTTTGG10.15923566878980894No Hit
CCGAGCTGTCGGCCAAGATCGGCGAGAAGCTCGAGCTGCGTCGTGTGGCGATTTTCGACG10.15923566878980894No Hit
TGCGTCAGTCACCTGCGCAACGTCCTGGACACCCTGGCCTACCTGCGGCACCAGACGAAT10.15923566878980894No Hit
GCCTGGAATGCTTCATCGGACTCCCAGTGTGTCACCACGAAGTAGCGTTCTTCACCCTTG10.15923566878980894No Hit
CGTCACCGGTGAGGGCCGATTCGACG10.15923566878980894No Hit
CGGGGGAGACATGTGTTGCGGTGCACCGGTGTCGACGAGTTGTCGATGGC10.15923566878980894No Hit
GGCCACCAACTTTGGCCCCTTGCAGCCGCTATTCGACGTCATCAACGCGCCCACCCTGGC10.15923566878980894No Hit
AGGTAGCCAAGACACTGTGTGTGAGCTGCCCGATCAGGCGGCAGTGCTTGGCCGCGGCGC10.15923566878980894No Hit
CCGAGCGTAACGCCACTGCGAAATTTCGGGCAGAAAATCGCAGTGGCGTTACGCTCGCGG10.15923566878980894No Hit
CCCACTGCGCGAAGTGATGTGGGGTCACGACGAAAACCTACTGAACAGCACGGAATTCGC10.15923566878980894No Hit
CTCGCGCCTGGCGGCGCTACCGAAACCCAAACGCGACTATGGCCGCCTTA10.15923566878980894No Hit
CATCCCTCAGGATCCCG10.15923566878980894No Hit
TGGCACGGCGCTCTTGACGACACTCTACGGCCTGATCAACAACGCGTTCATCCCGCGATT10.15923566878980894No Hit
ATTCGGACATACCCTGCCTCGGTGATCGGCGTGGTGGCCCACCCATTCGAGGAGAACTGC10.15923566878980894No Hit
AACCTGCGCCGTGAGGATAACGACA10.15923566878980894No Hit
TCCCGGCAGGCTCACTGGTGCTGCTGGCATGGGGTGCAGCCAACCGTGACCCGCGCCAGT10.15923566878980894No Hit
GGGCAGGCCGCCTCTCCCTGCCCCCACTACTACGG10.15923566878980894No Hit
CTTTTGGGCGACATGGTGTGGACCTTGCAAGATGGTA10.15923566878980894No Hit
GGGGGAGTTCGTCGGTCGGCTGGTCGGCTCCCGAGCGCATGCG10.15923566878980894No Hit
CGGCCCACCACCGGATCCCAACGACGACCCACCGCCG10.15923566878980894No Hit
GTAGTTCCGCTGCACCGCTCCGACGGGTCCGGTGACACCTTCTTGTTCACCCAGTACCTG10.15923566878980894No Hit
ACATATGCGGCGACAACCTGATCGATCGTGATTGGCCCCGCCAGCCTGACGACCGCGTCA10.15923566878980894No Hit
GTACTCGGCTAGAACCGCGTCGGAAATCGCGGGCCACCAGTCCAACGCCCAGTC10.15923566878980894No Hit
TCGCCCACATCAGCGGCAAGATGCGTCAGCACTACATTCGCATCCTGCCCGAGGACCGGG10.15923566878980894No Hit
GACGCCAACCACCTTCATCACAGATATCGGCGAGCCGAGAAACAGTACGG10.15923566878980894No Hit
ACGTCGCGTCGGCACCGATAAGTCGGTTTCCTTGGTACGCGCTTTGATTTCGTGGAAGAT10.15923566878980894No Hit
GACCATCACGTCAAGGGGGCCGATACGGGCATCGGCTCC10.15923566878980894No Hit
TATCACCGACATTGAAGCTGCCCGTGTTGTACGCCCCGGGGCTGG10.15923566878980894No Hit
TGCCACCGGTCAACCTACTGATCTTCGTCAGCACGGTAATCATCTTGTTCCTCGCCGGGG10.15923566878980894No Hit
AGCACCCTCATCGCTTCATCCCTTCTTGTCGTCGTCGTGGTTACGACGGCG10.15923566878980894No Hit
CGCGACGCTGGCCGGTTGCCTGCCGGCGGTGATCTGGTGGGCCTGTCACCGGCCGAATCG10.15923566878980894No Hit
CTGTCCGGTCGCATCCAGGTATGTCGGCGCACCCTGTCTGATGATGCCGGGCAGG10.15923566878980894No Hit
AGCGCTTGATACCCGCGGGAACCGAGCACCGGCC10.15923566878980894No Hit
TGTTGCTGGTTGTACCGGCGAAGGCACCGACCACCAGCACAACCAGCAACACCGCCCAAG10.15923566878980894No Hit
TGGGCACGTAGCATCGCGAAGATTT10.15923566878980894No Hit
GGCTTCCAAGTCGGACAGGCGTCTGCGGGTGCGCCGCAGATCTGCCGTGAGACGCTTCAG10.15923566878980894No Hit
CGGCGACGCCGCGGTGGACATGAACGAGGCTCTCGGCAACATCG10.15923566878980894No Hit
AGAATCCGTTCACGGCGGCGATCACCGGCACCGCGCACTCGTAGACGGGGCGGAAT10.15923566878980894No Hit
GACCGCGCTGATCTCGCAAGCGTGCTCCAGGATCCCGG10.15923566878980894No Hit
CGGTGCCGCGGCCATCCGGATTACCGATTACCTCAGCCGTGACCTCACCATCACCCAGGG10.15923566878980894No Hit
AAGACCAATCACCTTGGTGACACCGCCGCCGATTACGGCTT10.15923566878980894No Hit
TGGCGGCCGGGATGTTCAACGACGGCAACGTCAACCCGGGCAGGCTGAAGGCGCCGACGG10.15923566878980894No Hit
TTCATGAACCTGTGCACATCGGACGGGTCGAT10.15923566878980894No Hit
TGTCCGTAAACGGCCACCACGACGGTGCGATCCGGGCTGTTTTCTGACCACACCCCGATC10.15923566878980894No Hit
CCCGAGGCGCGGCCCACGACTGTCGCCTCATGGACAGGAGATGTAGTGGCAGCAGG10.15923566878980894No Hit
CCACCATCGTCGAGGGCGCCGGTGACACCGGCGCCATCGCCGGACGAGTGGCCCAGATCC10.15923566878980894No Hit
CGAGACCGACGGCGCCGAAAAAG10.15923566878980894No Hit
CCTCGATGATGCCCGCGGCCTCGTGTCCGAGCAGAAAAGGGTATTCGTCGTTGATGCCGC10.15923566878980894No Hit
ATGCGGGCTTTGCCAACAAAGGCCGGGTGGCCACGCCCAGGCAAGTTGTGAGGGAGGCCC10.15923566878980894No Hit
GGATCCGGGCCCCGGTGGCGCGGACGTCGTGGATGGGTTGC10.15923566878980894No Hit
TCGTCAGGCATGTCCGTCAGTGTTGCTCGGCGGGCACGTCGTCGGAAACTTCTCTCGATT10.15923566878980894No Hit
CCCAAGCCGCGGGACGACGCCGCCGCGCTGAAAGCCGCGACCCTGCCCCTCTACGTGCAT10.15923566878980894No Hit
CGTGCGGGGCGTCCTGCTGG10.15923566878980894No Hit
GCCCACCACCGCCTATGCGT10.15923566878980894No Hit
TCAACCTACACAAGACATTCTGCATTCCGCACGGCGGCGGGGGCCCAGG10.15923566878980894No Hit
ATGTGCAGGGCACCGACGTGGTCGGGTCACTGGTGGCGTTCGAAGACGGCCTGCCGCGCA10.15923566878980894No Hit
TGTTTCTCGACGGCAACCCGTGCGCGATT10.15923566878980894No Hit
CAGCTCCACCCTTGCCGATGCGGGGTCCGGCGGCGCCGGCGCGGCCGGCGGCA10.15923566878980894No Hit
TGGGGGTACCTCCCACTTGTGGGGGCGTACCCCCACCCCATCGCTTCGTCCCCCGCAAGC10.15923566878980894No Hit
GTGGCCTATATCGGCATCAGCTTCCTCGACCAGGCCA10.15923566878980894No Hit
CGGACCCGCCCCATCCGCTC10.15923566878980894No Hit
CGGATGCGCGGAGCGCTTCCGCAGAGGCCTAATCAACGTCGCCCAGTTCACCATGACCAA10.15923566878980894No Hit
CTTGCCACCTGGTGGGCTACCAACATCCCGACCTGACCGCCTACCCTGCCCAGATCCACG10.15923566878980894No Hit
TCCTCGCGCACGTTAGACGAATCGTCCTGGTTCATCT10.15923566878980894No Hit
AGCCTCTGACCAGGCGACATAGACAACAGTACCCCCGGAAGCGTGCGGAAGCAAGCGGCA10.15923566878980894No Hit
CAGGAGGTCCGCGAGACCCGCGGGCTGGCCTACTCGGTCTACTCCGCGCTGGATCTCTTC10.15923566878980894No Hit
CTAAGCGACAACGACGTGCGCCTACTCAAACCAGAAGTCCACCCACGGAAGTGTCAGAAG10.15923566878980894No Hit
AGAAATTCATGTATCCAGATCAGCTGGGTCCGGGCCGCTTCGAGCGGATCGGCCTCCC10.15923566878980894No Hit
ACCTCAACGGCCTCGGCTCCGTCGGCCCCATCACCATCCCGTCCATCACCATTCCCGAAA10.15923566878980894No Hit
TTTCGTGCCAGAGGAATTCGGCG10.15923566878980894No Hit
CCAGCGACGAAGTCACCGTGGGTGACGTGACGCTCGACGATGTCGGTGAC10.15923566878980894No Hit
GGCGGACTGCTACCGTCCACGCAGGCCATCATCGACATCACCGGGCAACACACCGGGCTG10.15923566878980894No Hit
CGCAGCCGCGCCAACAGTGTCTC10.15923566878980894No Hit
AGCGCGACATGACCGGGGCCGAAGTGTTTTGGGAACCTCCGGCCCAGCCGGTCGAATAGC10.15923566878980894No Hit
TGGTCTCGGGGCTCAAGTGACCCGGCCCCGCCAGTCCACGCTGGTCGCCACC10.15923566878980894No Hit
GCGGTCCCGGCTGACCTTGGCGAGCACGCTAGCCGCGGAGATGAC10.15923566878980894No Hit
GGGCGCCCGGCACCGGGCAGGCCGGCGGGGCCGGCGGCC10.15923566878980894No Hit
CGGCGGCGTTCGCTCGATAG10.15923566878980894No Hit
CGACAACGTGGTCGACAAGGCCACGCGAGTGCACGCCGCGGCATGGACGAAGTTCTTGGA10.15923566878980894No Hit
CGGGATGCCAACCACCACCA10.15923566878980894No Hit
CCGTGGTCAACACCGCGGTTTACACCGTCGGCACCGTGGTACCGACCGTTATCGTCAGCC10.15923566878980894No Hit
CAACGCCGCCTGGACACCGCGCGGATCGAATTGGCAGCGC10.15923566878980894No Hit
CAGCGCGCCTCGATGGCCGCGGTCGCCGGGTATTCGTCCTTATCGATCATGTTCTTGTCG10.15923566878980894No Hit
AACCGGGAGAGACGGCCGGTTTCTCGGTGGAGCGTCATCT10.15923566878980894No Hit
CCAGCGTCTCATTATGTGGACGCTATTTCGGATCTGGGGTGGGCGGGTTGATCCATGCCG10.15923566878980894No Hit
GGGGTCGACGATCCGCACGGTTGACGCCCGGGGAGCTCCATAACTCACCAGGTTGGCTCC10.15923566878980894No Hit
ATGGACGGTTGTTGCAAGTTCGCTGCGATCTCCGCGCATGTCTATTGCCCCCACCAGATC10.15923566878980894No Hit
ATGCCCGTGAGGTGCACGCGCTGGC10.15923566878980894No Hit
GCAAACGTTGACAATCCTTCCGGGCAGCCGGGCAGGCGGTCCCACAAACACAACCCACCT10.15923566878980894No Hit
TGCCGGCCAAGACCGCGGCATCATTGATCAGGTAGCGCGTCGTGAAGTTGTCTGTGCCGT10.15923566878980894No Hit
CATGACGGCCTGCGTTGGCGATCGCCGAAAGCAGCGGCGG10.15923566878980894No Hit
ATCCTGGTAGGGCCTTCCG10.15923566878980894No Hit
TTGACCCACGGCACGTGTTTGCGTGCGTATTTGCCCGCACTGCACACCGTGGAGCCGACC10.15923566878980894No Hit
AAAGCGAGGCCACCGAATTCACCGAGGCTTGGACGACGTTTGCGACCG10.15923566878980894No Hit
GTGATCGAGGTTGACGTCAGGCTGAAGGACGGCAGCTT10.15923566878980894No Hit
TGGTGGTACACCGGGCGGCCGCCC10.15923566878980894No Hit
TGGTTGATAGCAGCGGCTGCAATAGCG10.15923566878980894No Hit
TGTTGTCGGAATTTGGGTTGGCCG10.15923566878980894No Hit
AAGACGCGGTGGAACGCGAACTCGATCGACGCGATGGTTTTGGCATCACCGGACCCGATA10.15923566878980894No Hit
CGCTATGGCGCCCCCGTG10.15923566878980894No Hit
TACACTGGCGCGGTCCGCTCGCGCCGGCAGTGACTCA10.15923566878980894No Hit
GGCGGGCACGGCGATCGAGCCGGCCGAACTCGACGATCCGGACGCGGTGGTCGGTGCGCT10.15923566878980894No Hit
CCCGATGGGCCGTGGCCCG10.15923566878980894No Hit
CGACTACGAACCGCGGGATCCCGTC10.15923566878980894No Hit
CTGATCAAGAGCTGGCTCACCACCGATCCCGAGGCACC10.15923566878980894No Hit
TGGCTGAACGTTGCCCCCTGAAGGCGAGC10.15923566878980894No Hit
TCGTCATATTGGAGAAAGCCGACGACGTCGGCGGCACCTGGCGCGACA10.15923566878980894No Hit
CATCTTTGTCGAACGGGTGCTGCCCGGGGCGATCCTGCGACAGCTCAGCGACGAGGAAAT10.15923566878980894No Hit
ACGGTCGAGCCTGACCATGTTTTTGGCCGATAACCGACCAACCTGCCGCGTCAGGCTGGT10.15923566878980894No Hit
CGCCCCGGCTGAGCCGGATACCGCCACCGAGCAGTGTGCCGGCCTTGTTGGCCGCGCTGC10.15923566878980894No Hit
TGGTTCGCGTCGGTCTGGCCCCGGAGATGGAAGAGCCGCCACCGGCAACGCCTCGGCGAC10.15923566878980894No Hit
TGGGGTCGAAGCCCAGCGTAAGTATCGCCAACGCGGGGCCCAGCGTGATTAGGCGGCGCC10.15923566878980894No Hit
AGGTCATCGCCATGCGCGGCCTCGGTGACCCGGACGCCTTTCCGGCCAGTGATCTCGGCC10.15923566878980894No Hit
GTCCGGTGCCTGGATTGGGTTGCTCACGAAACCGGCTCCTGTCAGTT10.15923566878980894No Hit
CACATCGTTGACGAGGTACACCGCCGAGGCGGCCAGGCTGAACACCACGAAG10.15923566878980894No Hit
AATTGTCACCGAGCTAGCCCCGGTGTTCAAGGCGGCGGGTTTTCGGCCCAACAACTCCGT10.15923566878980894No Hit
TGACGAGTATTACCCGGTGGTGTA10.15923566878980894No Hit
GTAGCCCGCAGGCTGCGGG10.15923566878980894No Hit
AACTCCGTACCTTTCCGACTATCGCGATCGGCGGGGGCTT10.15923566878980894No Hit
GGGTTCAACCCGCATTTGCTGTCCGACCTGTCGCCGGTGAACGCCGCAATCAGTGCGTTG10.15923566878980894No Hit
CTCGACGAACGCCCCGCCGCGCTGCGGCGATGTAACCCGTGC10.15923566878980894No Hit
TCAGTCACCACGCTGACG10.15923566878980894No Hit
GACGGTTTGGAGCTGGTCCGTGTCGTTCGTGTTGATCT10.15923566878980894No Hit
TGGTTGGTGGCCCAGCTGGGCGGTCGCGTACGTCATAC10.15923566878980894No Hit
ACACATGGGCCATGCTGCGTGACG10.15923566878980894No Hit
TGCCGCAAATAGCGCAGGCGTCGGCGATGCTGAAGAATCTCAGCGCCGATTTCGCCGATA10.15923566878980894No Hit
TCCACCCAGACGCCGAATTGCTCGCCCCGGGTGACCAGGTCCCGCAGCGGATAGTCGGCC10.15923566878980894No Hit
ATTGACGTTCCCGGTACCGTCCGACGACCTGCCCTACATCCACCCGGTGACCGTCATCAA10.15923566878980894No Hit
GGGCTCCAAGGGTGGAGCGGGCAGCGGCCCCGCCGGCAA10.15923566878980894No Hit
GGAAAGCCGGTAGGTGGCGTGGATGGGCCCTGGGGCGATTG10.15923566878980894No Hit
TGTGAAGTCACGTCTTCGGTCAGGCTCATCATCATCTAATTTTCAGGTCTCTTTCAGAGC10.15923566878980894No Hit
CCCGCGAACCAGTGGTACGGCGCAGATTGACCTCGTATCATCTGAGTTAGTTGCCCGCGC10.15923566878980894No Hit
GGTCACCAGCACCAGCAGGATCATGATG10.15923566878980894No Hit
GGGTGACAAGCGATTTGGCGTCGCCGCGGTTGTTGGAGATGTCGATC10.15923566878980894No Hit
ATCGCCCGTATTGGGATCCGGGTCGCCGATGTGCTGGACCCGGAACAATTGACCCACAAG10.15923566878980894No Hit
GCCGACTGGCCGGCCGTCGA10.15923566878980894No Hit
AGCTGGTCGGCTTGCGAGCGGGAATTACATTGCGACAAAGTGGTTCTCATCGAACTCGTT10.15923566878980894No Hit
GAAGGTCATAAACCCCGGAAACCACCGTTTGATGGTCGTTCGCCAGGCCGGAGTGAGCT10.15923566878980894No Hit
CGGCTGGCTTTCTGGTCGGGTTGTTTGACGTGGGCAAACGTTAGGCGCCGAATCCGTCGG10.15923566878980894No Hit
GGGTGGCGAGCGACGGATCAGTCGCAGCATCCGGCGCTCGCCACCCGCCAGGTAGGCAAC10.15923566878980894No Hit
GTGACCTCGATGGTGCTGTATATCCTCTGC10.15923566878980894No Hit
TGGATCTCATCCATGTTGACCTCGCG10.15923566878980894No Hit
TTCCGGATTGCAGCCGCGACCCGGCCAGCAGCAGCGCCA10.15923566878980894No Hit
CCCTGGCGGTTCGACCGCGATCAGG10.15923566878980894No Hit
AGGCGTACGGATTTGTCCGCAGCCGACCCGAACTGAAGCTGCCCGATTTGGAGTTGATTT10.15923566878980894No Hit
CCGGATCGCGGTGACATCCGCTCGCCGCGCCGAAGAGCTGTGCGCATTGCTTCGCCGCCA10.15923566878980894No Hit
AAGCCGACCGACCGTGCCCACCAGGTGCGGGTGATCTGCGGTAACGGGCTGCGGCCGGAG10.15923566878980894No Hit
AGAGGGGACGGAAACGCGAGGAACCGTCCCACCTGGGCCTGCCCCCAGCGGGTCGTCAGA10.15923566878980894No Hit
CTTCTCACTGCACATTACCGGCGGTATCGTCGTAGCGCTGTTG10.15923566878980894No Hit
CCGGATCGCTCACAGCTGAAGGCGTTGTTGGCGTGATGTCATCTCGATCACTGTCCTGGC10.15923566878980894No Hit
TCGGCGCCGTTGGGCATGCGGATCCGCACACTGGTGCGCTCGTTGTCGTCGC10.15923566878980894No Hit
ACGGGTTCCCGATCGAGATCACGC10.15923566878980894No Hit
ATCTGTCTGGGTTTGGTTCTCGTGGCACTGAACTTCCTGATCTGCCTC10.15923566878980894No Hit
CGCCCGTTGATCGGCAATGGCGCTAACGGGGCCGACGGGACG10.15923566878980894No Hit
GGTCCGGGGCCCGACATTCGTGATCC10.15923566878980894No Hit
CAGCCAGGACGCCGACCCTCGTTACGCCGACATCGGC10.15923566878980894No Hit
GGCTCTCACCGGGTTCGGCGCCCGACGGGCCGGGTATGGCGG10.15923566878980894No Hit
TTGGCCAGGGTGTAGAC10.15923566878980894No Hit
TCGTCGGCGGTCGGGCGGCGCATGCCCAGCACG10.15923566878980894No Hit
GGGTGCCCGGAATGGCGAACTGGCGGGCAAGGGTTTCC10.15923566878980894No Hit
GCCCACCCGCTGGAACAGTTTGTCGCTGATGATGTTGGCAAAC10.15923566878980894No Hit
CATGGCGGCTAGAAACGAGATCTTGCGGGG10.15923566878980894No Hit
GGGTGACCCGTACGCCCAGAGCGGAAAACACATGTGCGAATTCCGCTGCAATGAAGCCGC10.15923566878980894No Hit
ATGCCATCCTCGCGGACTTGTA10.15923566878980894No Hit
GGTCCCGGCAAAACCCAACACGCCGG10.15923566878980894No Hit
CAGGACGCGCACGACACGTCAGAATAGCTCGATTTACC10.15923566878980894No Hit
CCGACGAGCAGAAAGCCCGCTATCTACGACCGCTGCTCG10.15923566878980894No Hit
CGGTGGCCATGTTGTCACTGGTGGCGATCATCTTCTCGGCGG10.15923566878980894No Hit
TCGGCTTGCCGAATTGCTCCCGCACCTTGGCGTAGGCAACCGCGG10.15923566878980894No Hit
CGGACCTGCTGGCTGAATTACGCGCTGACGCTGAGGATC10.15923566878980894No Hit
CAGATCAACGTGGTGCCTACTGCCGCACTG10.15923566878980894No Hit
GGTGAGCGCGTGCAAGACCACCACCACGTTGTCCCGTGCGGGCGACAATTTGCCCCAGCG10.15923566878980894No Hit
ACATGCCAACCCCAACGCGGCCGGAGCCATC10.15923566878980894No Hit
GTGCACCACGTTGTTCGTCCGGGCGCATCAACCGCAGC10.15923566878980894No Hit
GTGCGGTGCTGGCGCCCGACGGCC10.15923566878980894No Hit
CGGGCTCGGCAAGCGACTGACCGGTAAGTCAAAA10.15923566878980894No Hit
GCCAACATCGTCAGCGGTGGAGACGGTGGCCTCGGCGGTGCC10.15923566878980894No Hit
CGGCGGCCACGGCGGATCGGCCGCCAA10.15923566878980894No Hit
GGCGGCGGCTCGGGATACAGGTCGCGGTAGTTGACACCCAACCCGTCGGGGGCGTCCTCC10.15923566878980894No Hit
CTATTACTGCACGTCGATCCGGTGTCCGTAGCGCACGACGATGGCTTCGCCCATGCGCGG10.15923566878980894No Hit
TTGCGGTTTTGCTCAGCGCCCATGCCGAG10.15923566878980894No Hit
TTGTCAGTAAACGGCGACACGACGATGAGGTCGTCACC10.15923566878980894No Hit
GGCCAGCACATCGGACCCGCCAGGAAGCACGGTGGCCGCCGCCGCGTGCAC10.15923566878980894No Hit
CGATTGGGCCGTAGGAGCTGCCCCACCCGCTCAACACC10.15923566878980894No Hit
TGGCGGATAAGGATATTGCCCTCGACCGCAGCATGATTCCGCTAGG10.15923566878980894No Hit
CCGAAATAACGCGTTGACCTGTGCGCCGTCAGGGTTTCGAACCCCGGACCCGCTGATTAA10.15923566878980894No Hit
CTCTGCGGCCGCATAGCAAACCGCACCCATGTTGAGCGCCTGCACGAACTGGTCGTGGAA10.15923566878980894No Hit
TTTCCGGCGGCCGTT10.15923566878980894No Hit
GTAGTCGCAGGCAACCCGATGCACGCGTCGGCCGAGCTGATCAACGTCG10.15923566878980894No Hit
CGACCCCAACGTGCCCGACGCGTTCGTGGTGATCGCCGACCGGTTGGGCAACAGCGTCTA10.15923566878980894No Hit
CTCGGCGGCCTGTTCGTCGAAGGTGCGGTCAATGCGCTC10.15923566878980894No Hit
CGCGGTGTGATCGCACGACAA10.15923566878980894No Hit
GTCGGACACAGTGGTCGTCGTAGCGGCCCGCCGGCGGG10.15923566878980894No Hit
GGTGGGCGGAGGGTTCATCGTCACGGAACAGACCAGCGGCAACAGCGGCCAACATCC10.15923566878980894No Hit
GAGCCGCCACGTCCGCTGACCCAT10.15923566878980894No Hit
CCACCTGTACCGCGTTAACCAGCCGCGCCGACGCGGCGATCGCGGTGCCGCGGGCGGTGA10.15923566878980894No Hit
CGACATCGACGCCGTGCGCGACGCCGAGCTGCGCTACGCGCAGCGCTATCGCACCCCGCG10.15923566878980894No Hit
TCCGCGCACGCCCGCGTCGGCCAACGCCAGCAGCGCCTCGGCCAGGTCGGCGACGTAG10.15923566878980894No Hit
CATCCGCGAGGACAG10.15923566878980894No Hit
GTCATCAGCCACAACGTGGACCTGGTGGCCGATGTCGTCAATAAAGTGTGGTTCCTGGAT10.15923566878980894No Hit
GTTGGTCAGGATATCGACGAGAAACTGCGGTGGTATGCCTTGGGCCGCAGCCAGATCGTC10.15923566878980894No Hit
CGCGATCACCTCGGCGTCGCGCGCCCCGGGATCACCGGCCCGACCGCCCAGCGCCGCCCG10.15923566878980894No Hit
TGGGCATTCTCGAGCCACCGC10.15923566878980894No Hit
ACCTCATCGACGTGCGCTCTCCCGACGAGTTCTCCGGCAAGATCCTGGCCCCCGCGCACC10.15923566878980894No Hit

[OK]Adapter Content

Adapter graph

[OK]Kmer Content

No overrepresented Kmers

\ No newline at end of file diff --git a/callvariants/SRR671851_2_fastqc.html b/callvariants/SRR671851_2_fastqc.html new file mode 100644 index 0000000..270ffc0 --- /dev/null +++ b/callvariants/SRR671851_2_fastqc.html @@ -0,0 +1,187 @@ +SRR671851_2.fastq.gz FastQC Report
FastQCFastQC Report
Wed 22 Apr 2015
SRR671851_2.fastq.gz

[OK]Basic Statistics

MeasureValue
FilenameSRR671851_2.fastq.gz
File typeConventional base calls
EncodingIllumina 1.5
Total Sequences100000
Sequences flagged as poor quality0
Sequence length75
%GC64

[FAIL]Per base sequence quality

Per base quality graph

[OK]Per tile sequence quality

Per base quality graph

[OK]Per sequence quality scores

Per Sequence quality graph

[FAIL]Per base sequence content

Per base sequence content

[OK]Per sequence GC content

Per sequence GC content graph

[OK]Per base N content

N content graph

[OK]Sequence Length Distribution

Sequence length distribution

[OK]Sequence Duplication Levels

Duplication level graph

[OK]Overrepresented sequences

No overrepresented sequences

[OK]Adapter Content

Adapter graph

[WARN]Kmer Content

Kmer graph

SequenceCountPValueObs/Exp MaxMax Obs/Exp Position
TGACGAA200.00733950851.7552
TGCAAGA200.00733950851.7556
\ No newline at end of file diff --git a/callvariants/SRR671851_2_per_base_quality.png b/callvariants/SRR671851_2_per_base_quality.png new file mode 100755 index 0000000..4ff080d Binary files /dev/null and b/callvariants/SRR671851_2_per_base_quality.png differ diff --git a/callvariants/fastqc_qc_1_pe.png b/callvariants/fastqc_qc_1_pe.png new file mode 100644 index 0000000..842763c Binary files /dev/null and b/callvariants/fastqc_qc_1_pe.png differ diff --git a/callvariants/fastqc_qc_1_se.png b/callvariants/fastqc_qc_1_se.png new file mode 100644 index 0000000..856d27c Binary files /dev/null and b/callvariants/fastqc_qc_1_se.png differ diff --git a/callvariants/fastqc_qc_2_pe.png b/callvariants/fastqc_qc_2_pe.png new file mode 100644 index 0000000..b1181ba Binary files /dev/null and b/callvariants/fastqc_qc_2_pe.png differ diff --git a/callvariants/fastqc_qc_2_se.png b/callvariants/fastqc_qc_2_se.png new file mode 100644 index 0000000..15cb674 Binary files /dev/null and b/callvariants/fastqc_qc_2_se.png differ diff --git a/callvariants/fastqc_screenshot_raw_1.png b/callvariants/fastqc_screenshot_raw_1.png new file mode 100644 index 0000000..d8e32b9 Binary files /dev/null and b/callvariants/fastqc_screenshot_raw_1.png differ diff --git a/callvariants/fastqc_screenshot_raw_2.png b/callvariants/fastqc_screenshot_raw_2.png new file mode 100644 index 0000000..e917624 Binary files /dev/null and b/callvariants/fastqc_screenshot_raw_2.png differ diff --git a/callvariants/index.txt b/callvariants/index.txt new file mode 100644 index 0000000..128d7fc --- /dev/null +++ b/callvariants/index.txt @@ -0,0 +1,29 @@ +============================= +The Variant Calling Protocol +============================= + +:authors: Aditi Gupta, C. Titus Brown. + +This is a protocol for identifying Single Nucleotide Polymorphisms (SNPs) in next-generation sequencing datasets. It is part of khmer-protocols; see +`the main page for this version <../index.html>`__ for citation +information, and `the khmer-protocols site +`__ for the latest released +version. + + +The tutorial: + +.. toctree:: + :maxdepth: 1 + + 0-download-and-save + 1-quality-control + 2-mapping + 3-variant-calling + 4-annotation + 5-visualization + +Reference material +------------------ +:doc:`../docs/command-line` +:doc:`../amazon/index` diff --git a/callvariants/installing-blastkit.txt b/callvariants/installing-blastkit.txt new file mode 100644 index 0000000..e7330fe --- /dev/null +++ b/callvariants/installing-blastkit.txt @@ -0,0 +1,120 @@ +=============================== +4. BLASTing your assembled data +=============================== +.. shell:: start +One thing everyone wants to do is BLAST sequence data, right? Here's a +simple way to set up a stylish little BLAST server that lets you search +your newly assembled sequences. + +Installing blastkit +------------------- + +Installing some prerequisites: +:: + + pip install pygr + pip install whoosh + pip install Pillow + pip install Jinja2 + pip install git+https://github.com/ctb/pygr-draw.git + pip install git+https://github.com/ged-lab/screed.git + apt-get -y install lighttpd + +and configure them: +:: + + cd /etc/lighttpd/conf-enabled + ln -fs ../conf-available/10-cgi.conf ./ + echo 'cgi.assign = ( ".cgi" => "" )' >> 10-cgi.conf + echo 'index-file.names += ( "index.cgi" ) ' >> 10-cgi.conf + + /etc/init.d/lighttpd restart + +Next, install BLAST: +:: + + cd /root + + curl -O ftp://ftp.ncbi.nih.gov/blast/executables/release/2.2.24/blast-2.2.24-x64-linux.tar.gz + tar xzf blast-2.2.24-x64-linux.tar.gz + cp blast-2.2.24/bin/* /usr/local/bin + cp -r blast-2.2.24/data /usr/local/blast-data + +And put in blastkit: +:: + + cd /root + git clone https://github.com/ctb/blastkit.git -b ec2 + cd blastkit/www + ln -fs $PWD /var/www/blastkit + + mkdir files + chmod a+rxwt files + chmod +x /root + +and run check.py: +:: + + cd /root/blastkit + python ./check.py + +It should say everything is OK. + +Adding the data +--------------- + +If you've just finished a transcriptome assembly (:doc:`3-big-assembly`) then +you can do this to copy your newly generated assembly into the right place: +:: + + cp trinity_out_dir/Trinity.fasta /root/blastkit/db/db.fa + +Alternatively, you can grab my version of the assembly (from running this +tutorial): +:: + + cd /root/blastkit + curl -O https://s3.amazonaws.com/public.ged.msu.edu/trinity-nematostella-raw.fa.gz + gunzip trinity-nematostella-raw.fa.gz + mv trinity-nematostella-raw.fa db/db.fa + +Formatting the database +~~~~~~~~~~~~~~~~~~~~~~~ + +After you've done either of the above, format and install the database +for blastkit: +:: + + cd /root/blastkit + formatdb -i db/db.fa -o T -p F + python index-db.py db/db.fa + +Done! + +.. note:: + + You can install any file of DNA sequences you want this way; just copy + it into /root/blastkit/db/db.fa and run the indexing commands, above. + +Running blastkit +---------------- + +Figure out what your machine name is +(ec2-???-???-???-???.compute-1.amazonaws.com) and go to:: + + http://machine-name/blastkit/ + +Make sure you have enabled port 80 in your security settings on Amazon. + +(If you're using the Nematostella data set, try this sequence:: + + CAGCCTTTAGAAGGAAACAGTGGCAATATATAATTCTAGATGAAGCTCAGAATATCAAAA + ATTTTAAAAGTCAAAGGTGGCAGTTGCTGTTGAATTTTTCAAGTCAGAGGAGACTTTTGT + TGACTGGAACACCTTTGCAGAACAATTTGATGGAGCTGTGGTCGCTTATGCATTTCCTCA + TGCCATCAATGTTTGCTTCTCATAAAGATTTTAGGGAGTGGTTTTCTAACCCTGTTACAG + GGATGATTGAAGGGAATTCAG + +It should match something in your assembly.) + +Next: :doc:`5-building-transcript-families` + diff --git a/callvariants/notes.txt b/callvariants/notes.txt new file mode 100644 index 0000000..611a144 --- /dev/null +++ b/callvariants/notes.txt @@ -0,0 +1,7 @@ +1. Do it on Amazon. +2. Use khmer to run trimmomatic script from sandbox +3. Explain each option in a command line statement. Example: gunzip -c | fastq_quality_filter -Q33 -q 30 -p 50 | gzip -9c > .qc.fq.gz +explain what Q33, -q, -p flags are. +4. "Make sure you're running from screen". What does this mean? Terminal or html screen? Give an example. +5. khmer, partitioning not needed for variant calling. Confirm. +6. Try different mappers: bowtie, bowtie2, bwa (talk about which version 0.7 is buggy with samtools index file)