Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jlanga authored Jul 18, 2019
1 parent 33e300a commit c1d0eef
Showing 1 changed file with 60 additions and 70 deletions.
130 changes: 60 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,63 +5,38 @@
Get exons from a transcriptome and raw genomic reads using abyss-bloom and bedtools

## Requirements
```
abyss==2.0.1 (something is happening with 2.0.2 and abyss-bloom kmers)
bedtools (tested on 2.0)
python3
biopython
networkx
pandas
biobloomtools
```
We recomend installing these packages with `conda` and `bioconda`

## How to install

Copy this repo and install it with `pip`:
Docker or different apt and conda packages (see installation guide).

```sh
git clone https://github.com/jlanga/exfi.git
pip install --user exfi
```
## How to install

To install other dependencies, follow the instructions from the travis files:

1. Install packages with `apt`:

```sh
sudo apt install build-essential git curl libboost-dev gcc autoconf bzip2 zlib1g libsparsehash-dev
sudo apt install \
autoconf build-essential bzip2 cmake curl gcc git libboost-dev libsdsl3 \
libz-dev zlib1g
```

2. Install conda, then
2. Install conda, then configure channels and install

```sh
conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels r
conda config --add channels bioconda
conda install --yes abyss biopython bedtools networkx pandas pip
conda install --yes abyss=2.0.1 bedtools biopython pandas pip
```

3. Install `biobloomtools`

There is an easy way, with `brew` (not the latest release):

```sh
brew install biobloomtools
```

Or the manual way:
You may need to use `sudo`:

```sh
# Install SDSL-Lite
git clone https://github.com/simongog/sdsl-lite.git
pushd sdsl-lite/ && \
sudo ./install.sh /usr/local/ && \
popd

# Install biobloomtools
git clone https://github.com/bcgsc/biobloom.git
git clone --recursive https://github.com/bcgsc/biobloom.git
pushd biobloom/ && \
git submodule update --init && \
./autogen.sh && \
Expand All @@ -71,6 +46,21 @@ sudo make install && \
popd
```

4. Copy this repo and install it with `pip`:

```sh
git clone --recursive https://github.com/jlanga/exfi.git
pip install --user exfi
```

If you have access to Docker, you can create a Debian container with the following command:

```sh
docker build --rm --tag exfi:v1.5.6 github.com/jlanga/exfi-docker
```

[More info](https://github.com/jlanga/exfi-docker)



## Required data
Expand All @@ -83,30 +73,31 @@ popd

1. Make a baited Bloom filter of the genomic reads with `build_baited_bloom_filter`:
- `genome.fa.gz` is the set of genomic reads and
- `genome_k27_m500M_l1.bloom` is the resulting Bloom filter, made of kmers of length 27, a size of 500 Mb and the number of times of a kmer must be in the reads is 1 (levels).
- `genome_k25_m100M_l1.bf` is the resulting Bloom filter, made of kmers of length 25, a size of 100 MB and the number of times of a kmer must be in the reads is 1 (levels).

```sh
# Assuming that you are in the exfi folder:
build_baited_bloom_filter \
--input-fasta data/transcript.fa \
--kmer 27 \
--bloom-size 500M \
--kmer 25 \
--bloom-size 100M \
--levels 1 \
--threads 4 \
--output-bloom results/genome_k27_m500M_l1.bloom \
--output-bloom results/genome_k25_m100M_l1.bf \
genome.fa.gz
```

2. Run `build_splice_graph` to get putative exons in the transcriptome.
- `data/transcript.fa` is the input transcriptome,
- `genome_k27_m500M_l1.bloom` is the Bloom filter generated above
- `genome_k25_m500M_l1.bf` is the Bloom filter generated above
- kmer length has to be the same
- `test.gfa` is the resulting splice graph in [GFA1 format](https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md).

```sh
build_splicegraph \
--input-fasta data/transcript.fa \
--input-bloom results/genome_k27_m500M_l1.bloom \
--kmer 27 \
--input-bloom results/genome_k25_m100M_l1.bloom \
--kmer 25 \
--max-fp-bases 5 \
--output-gfa test.gfa
```
Expand All @@ -116,35 +107,34 @@ This splice graph can be visualized with [Bandage](https://rrwick.github.io/Band
Example:

```
H VN:Z:1.0
S EXON00000000001 GTAAGCCGCGGCGGTGTGTGTGTGTGTGTGTGTTCTCCGTCATCTGTGTTCTGCTGAATGATGAGGACAGACGTGTTTCTCCAGCGGAGGAAGCGTAGAGATGTTCTGCTCTCCATCATCGCTCTTCTTCTGCTCATCTTCGCCATCGTTCATCTCGTCTTCTGCGCTGGACTGAGTTTCCAGGGTTCGAGTTCTGCTCGCGTCCGCCGAGACCTC LN:i:216
S EXON00000000002 GAGAATGCGAGTGAGTGTGTGCAGCCACAGTCGTCTGAGTTTCCTGAAGGATTCTTCACGGTGCAGGAGAGGAAAGATGGAGGAATCCTGATTTACTTCATGATCATCTTCTACATGCTGCTGTCCGTCTCCATCGTGTGTGATGAATATTTTCTGCCATCTCTGGAGGTCATCAGCGAGCG LN:i:182
S EXON00000000003 GTCTTGGTCTCTCGCAGGATGTTGCTGGAGCCACGTTTATGGCTGCGGGGAGTTCGGCTCCAGAGCTCGTCACTGCATTTCTGGG LN:i:85
S EXON00000000004 GGTGTGTTTGTGACGAAGGGCGACATCGGCGTCAGCACCATCATGGGTTCTGCTGTCTATAACCTGCTGTGCATCTGTGCAGCGTGCGGCCTGCTGTCCTCTGCAG LN:i:106
S EXON00000000005 GTTGGTCGTCTGAGCTGCTGGCCGTTGTTCAGAGATTGTGTTGCGTACTCCATCAGTGTCGCCGCCGTCATCGCCATCATCTCAGATAACAGAGTTTACTGG LN:i:102
S EXON00000000006 GGTATGATGGCGCGTGTCTCCTGCTGGTGTACGGTGTGTATGTAGCTGTACTGTGTTTCGATCTGAAGATCAGCGAGTACGTGATGCAGCGCTTCAGTCCATGCTGCTGGTGTCTGAAACCTCGCGATCGTGACTCAGGCGAGCAGCAGCCTCTAGTGGGCTGGAGTGACGACAGCAGCCTGCGGGTCCAGCGCCGTTCCAGAAATGACAGCGGAATATTCCAGGATGATTCTGGATATTCACATCTATCGCTCAGCCTGCACGGACTCAACGAAATCAGCGAC LN:i:284
S EXON00000000007 GAGCACAAGAGTGTGTTCTCCATGCCGGATCACGATCTGAAGCGAATCCTGTGGGTTTTGTCTCTTCCGGTCAGCACTCTGCTGTTTGTGAGCGTTCCCGACTGCAGGAGACCCTTCTGGAAGAACTTCTACATGCTGACCTTCCTGATGTCCGCCGTCTGGATTTCTGCATTCACTTATGTGCTGGTCTGGATGGTCACAATCGTGG LN:i:208
S EXON00000000008 GGGGAGACTCTGGGAATCCCGGACACAGTGATGGGAATGACTCTTCTGGCTGCAGGAACCAGTATCCCCGACACCGTGGCCAGTGTGATGGTGGCACGAGAAGGTAA LN:i:107
S EXON00000000009 AGGTAAATCTGATATGGCCATGTCCAACATCGTGGGCTCTAACGTGTTCGATATGCTGTGTCTGGGCCTGCCGTGGTTCATCCAGACGGTGTTTGTTGACGTGGGCTCCCCGGTGGATGTCAACAGCTCGGGGCTGGTCTTCATGTCCTGCACGCTGCTGCTCTCCATCATCTTCCTCTTCCTCGCCGTGCACATCAACGGCTGGAAGCTGGACTGGAAGCTGGGTCTGGTGTGTTTGGCGTGTTACATTCTGTTCGCAACACTCTCCATCCTGTACGAGCTCGGCATCATCGGGAACAATCCCATACGCTCCTGCAGCGACTGAACACTGCTCTACAGCGCCCCCTTATGGACAACACAAGGACGTGACTCTTTATAACCCTCTAAAGTGCACAGGTTCATTACTGAATACAAGAAAATAGAACTGCGAGACGTCAACTCAAAATACAAGAGAAGTCAAAGTGCGAGATGTAAAAAATATATGCACATAAATGAGGATAAACTTTTTATTTAATAAGACAAAACTGCATAAAGTCTGATGTGAACACTGCTCAACAGCGCCCTCTCATGGACAACACATGGATCTGACTCTTATTAACCCTCCAGAGTGCAAATACACTAACACAACGTAATATAACCAAGTTAAAATGGCAAGATGTGAACTCAAAATACAAGAAAGCAGTCAAGATGCCCGACATAACAAATGTGCATTAAAATGTAAGCCC LN:i:725
L EXON00000000001 + EXON00000000002 + 0M
L EXON00000000002 + EXON00000000003 + 1M
L EXON00000000003 + EXON00000000004 + 2M
L EXON00000000004 + EXON00000000005 + 1M
L EXON00000000005 + EXON00000000006 + 2M
L EXON00000000006 + EXON00000000007 + 0M
L EXON00000000007 + EXON00000000008 + 1M
L EXON00000000008 + EXON00000000009 + 6M
C ENSDART00000033574.5 + EXON00000000001 + 0 216M
C ENSDART00000033574.5 + EXON00000000002 + 216 182M
C ENSDART00000033574.5 + EXON00000000003 + 397 85M
C ENSDART00000033574.5 + EXON00000000004 + 480 106M
C ENSDART00000033574.5 + EXON00000000005 + 585 102M
C ENSDART00000033574.5 + EXON00000000006 + 685 284M
C ENSDART00000033574.5 + EXON00000000007 + 969 208M
C ENSDART00000033574.5 + EXON00000000008 + 1176 107M
C ENSDART00000033574.5 + EXON00000000009 + 1277 725M
P ENSDART00000033574.5 EXON00000000001+,EXON00000000002+,EXON00000000003+,EXON00000000004+,EXON00000000005+,EXON00000000006+,EXON00000000007+,EXON00000000008+,EXON00000000009+
H VN:Z:1.0
S ENSDART00000033574.5:0-216 GTAAGCCGCGGCGGTGTGTGTGTGTGTGTGTGTTCTCCGTCATCTGTGTTCTGCTGAATGATGAGGACAGACGTGTTTCTCCAGCGGAGGAAGCGTAGAGATGTTCTGCTCTCCATCATCGCTCTTCTTCTGCTCATCTTCGCCATCGTTCATCTCGTCTTCTGCGCTGGACTGAGTTTCCAGGGTTCGAGTTCTGCTCGCGTCCGCCGAGACCTC
S ENSDART00000033574.5:216-398 GAGAATGCGAGTGAGTGTGTGCAGCCACAGTCGTCTGAGTTTCCTGAAGGATTCTTCACGGTGCAGGAGAGGAAAGATGGAGGAATCCTGATTTACTTCATGATCATCTTCTACATGCTGCTGTCCGTCTCCATCGTGTGTGATGAATATTTTCTGCCATCTCTGGAGGTCATCAGCGAGCG
S ENSDART00000033574.5:397-482 GTCTTGGTCTCTCGCAGGATGTTGCTGGAGCCACGTTTATGGCTGCGGGGAGTTCGGCTCCAGAGCTCGTCACTGCATTTCTGGG
S ENSDART00000033574.5:480-586 GGTGTGTTTGTGACGAAGGGCGACATCGGCGTCAGCACCATCATGGGTTCTGCTGTCTATAACCTGCTGTGCATCTGTGCAGCGTGCGGCCTGCTGTCCTCTGCAG
S ENSDART00000033574.5:585-687 GTTGGTCGTCTGAGCTGCTGGCCGTTGTTCAGAGATTGTGTTGCGTACTCCATCAGTGTCGCCGCCGTCATCGCCATCATCTCAGATAACAGAGTTTACTGG
S ENSDART00000033574.5:685-969 GGTATGATGGCGCGTGTCTCCTGCTGGTGTACGGTGTGTATGTAGCTGTACTGTGTTTCGATCTGAAGATCAGCGAGTACGTGATGCAGCGCTTCAGTCCATGCTGCTGGTGTCTGAAACCTCGCGATCGTGACTCAGGCGAGCAGCAGCCTCTAGTGGGCTGGAGTGACGACAGCAGCCTGCGGGTCCAGCGCCGTTCCAGAAATGACAGCGGAATATTCCAGGATGATTCTGGATATTCACATCTATCGCTCAGCCTGCACGGACTCAACGAAATCAGCGAC
S ENSDART00000033574.5:969-1177 GAGCACAAGAGTGTGTTCTCCATGCCGGATCACGATCTGAAGCGAATCCTGTGGGTTTTGTCTCTTCCGGTCAGCACTCTGCTGTTTGTGAGCGTTCCCGACTGCAGGAGACCCTTCTGGAAGAACTTCTACATGCTGACCTTCCTGATGTCCGCCGTCTGGATTTCTGCATTCACTTATGTGCTGGTCTGGATGGTCACAATCGTGG
S ENSDART00000033574.5:1176-1283 GGGGAGACTCTGGGAATCCCGGACACAGTGATGGGAATGACTCTTCTGGCTGCAGGAACCAGTATCCCCGACACCGTGGCCAGTGTGATGGTGGCACGAGAAGGTAA
S ENSDART00000033574.5:1277-2002 AGGTAAATCTGATATGGCCATGTCCAACATCGTGGGCTCTAACGTGTTCGATATGCTGTGTCTGGGCCTGCCGTGGTTCATCCAGACGGTGTTTGTTGACGTGGGCTCCCCGGTGGATGTCAACAGCTCGGGGCTGGTCTTCATGTCCTGCACGCTGCTGCTCTCCATCATCTTCCTCTTCCTCGCCGTGCACATCAACGGCTGGAAGCTGGACTGGAAGCTGGGTCTGGTGTGTTTGGCGTGTTACATTCTGTTCGCAACACTCTCCATCCTGTACGAGCTCGGCATCATCGGGAACAATCCCATACGCTCCTGCAGCGACTGAACACTGCTCTACAGCGCCCCCTTATGGACAACACAAGGACGTGACTCTTTATAACCCTCTAAAGTGCACAGGTTCATTACTGAATACAAGAAAATAGAACTGCGAGACGTCAACTCAAAATACAAGAGAAGTCAAAGTGCGAGATGTAAAAAATATATGCACATAAATGAGGATAAACTTTTTATTTAATAAGACAAAACTGCATAAAGTCTGATGTGAACACTGCTCAACAGCGCCCTCTCATGGACAACACATGGATCTGACTCTTATTAACCCTCCAGAGTGCAAATACACTAACACAACGTAATATAACCAAGTTAAAATGGCAAGATGTGAACTCAAAATACAAGAAAGCAGTCAAGATGCCCGACATAACAAATGTGCATTAAAATGTAAGCCC
L ENSDART00000033574.5:0-216 + ENSDART00000033574.5:216-398 + 0M
L ENSDART00000033574.5:216-398 + ENSDART00000033574.5:397-482 + 1M
L ENSDART00000033574.5:397-482 + ENSDART00000033574.5:480-586 + 2M
L ENSDART00000033574.5:480-586 + ENSDART00000033574.5:585-687 + 1M
L ENSDART00000033574.5:585-687 + ENSDART00000033574.5:685-969 + 2M
L ENSDART00000033574.5:685-969 + ENSDART00000033574.5:969-1177 + 0M
L ENSDART00000033574.5:969-1177 + ENSDART00000033574.5:1176-1283 + 1M
L ENSDART00000033574.5:1176-1283 + ENSDART00000033574.5:1277-2002 + 6M
C ENSDART00000033574.5 + ENSDART00000033574.5:0-216 + 0 216M
C ENSDART00000033574.5 + ENSDART00000033574.5:216-398 + 216 182M
C ENSDART00000033574.5 + ENSDART00000033574.5:397-482 + 397 85M
C ENSDART00000033574.5 + ENSDART00000033574.5:480-586 + 480 106M
C ENSDART00000033574.5 + ENSDART00000033574.5:585-687 + 585 102M
C ENSDART00000033574.5 + ENSDART00000033574.5:685-969 + 685 284M
C ENSDART00000033574.5 + ENSDART00000033574.5:969-1177 + 969 208M
C ENSDART00000033574.5 + ENSDART00000033574.5:1176-1283 + 1176 107M
C ENSDART00000033574.5 + ENSDART00000033574.5:1277-2002 + 1277 725M
P ENSDART00000033574.5 ENSDART00000033574.5:0-216+,ENSDART00000033574.5:216-398+,ENSDART00000033574.5:397-482+,ENSDART00000033574.5:480-586+,ENSDART00000033574.5:585-687+,ENSDART00000033574.5:685-969+,ENSDART00000033574.5:969-1177+,ENSDART00000033574.5:1176-1283+,ENSDART00000033574.5:1277-2002+ *
```

3. Get exonic sequences
Expand Down

0 comments on commit c1d0eef

Please sign in to comment.