Skip to content

Commit 2a029a2

Browse files
authored
Merge pull request #3699 from vgteam/giraffe-readme
Add Giraffe to the README
2 parents cdbe857 + c7e491f commit 2a029a2

File tree

1 file changed

+41
-13
lines changed

1 file changed

+41
-13
lines changed

README.md

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ At present, you will need GCC version 4.9 or greater, with support for C++14, to
7171

7272
Other libraries may be required. Please report any build difficulties.
7373

74-
Note that a 64-bit OS is required. Ubuntu 18.04 should work.
74+
Note that a 64-bit OS is required. Ubuntu 20.04 should work.
7575

7676
When you are ready, build with `. ./source_me.sh && make`, and run with `./bin/vg`.
7777

@@ -189,29 +189,39 @@ Note that `vg` tools can generally read all supported graph formats (VG, uncompr
189189

190190
The format of a given graph file can be retrieved with `vg stats -F`.
191191

192-
### Alignment
192+
### Mapping
193193

194-
As this is a small graph, you could align to it using a full-length partial order alignment:
194+
If you have more than one sequence, or you are working on a large graph, you will want to map rather than merely aligning.
195195

196-
<!-- !test check Align a string to a graph -->
197-
```sh
198-
vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG x.vg
199-
```
196+
There are multiple read mappers in `vg`:
200197

201-
Note that you don't have to store the graph on disk at all, you can simply pipe it into the local aligner:
198+
* `vg giraffe` is designed to be fast for highly accurate short reads, against graphs with haplotype information.
199+
* `vg map` is a general-purpose read mapper.
200+
* `vg mpmap` does "munti-path" mapping, to allow describing local alignment uncertainty. [This is useful for transcriptomics.](#Transcriptomic-analysis)
202201

203-
<!-- !test check Align a string to a piped graph -->
202+
#### Mapping with `vg giraffe`
203+
204+
To use `vg giraffe` to map reads, you will first need to prepare indexes. This is best done using `vg autoindex`. In order to get `vg autoindex` to use haplotype information from a VCF file, you can give it the VCF and the associated linear reference directly.
205+
206+
<!-- !test check Simulate and map back with surjection with Giraffe -->
204207
```sh
205-
vg construct -r small/x.fa -v small/x.vcf.gz | vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG -
208+
# construct the graph and indexes (paths below assume running from `vg/test` directory)
209+
vg autoindex --workflow giraffe -r small/x.fa -v small/x.vcf.gz -p x
210+
211+
# simulate a bunch of 150bp reads from the graph, into a GAM file of reads aligned to a graph
212+
vg sim -n 1000 -l 150 -x x.giraffe.gbz -a > x.sim.gam
213+
# now re-map these reads against the graph, and get BAM output in linear space
214+
# FASTQ input uses -f instead of -G.
215+
vg giraffe -Z x.giraffe.gbz -G x.sim.gam -o BAM > aln.bam
206216
```
207217

208-
Most commands allow the streaming of graphs into and out of `vg`.
218+
[More information on using `vg girafe` can be found on the `vg` wiki.](https://github.com/vgteam/vg/wiki/Mapping-short-reads-with-Giraffe)
209219

210-
### Mapping
220+
#### Mapping with `vg map`
211221

212222
If your graph is large, you want to use `vg index` to store the graph and `vg map` to align reads. `vg map` implements a kmer based seed and extend alignment model that is similar to that used in aligners like novoalign or MOSAIK. First an on-disk index is built with `vg index` which includes the graph itself and kmers of a particular size. When mapping, any kmer size shorter than that used in the index can be employed, and by default the mapper will decrease the kmer size to increase sensitivity when alignment at a particular _k_ fails.
213223

214-
<!-- !test check Simulate and map back with surjection -->
224+
<!-- !test check Simulate and map back with surjection with map -->
215225
```sh
216226
# construct the graph (paths below assume running from `vg/test` directory)
217227
vg construct -r small/x.fa -v small/x.vcf.gz > x.vg
@@ -381,6 +391,24 @@ vg mpmap -n rna -t 4 -x vg_rna.spliced.xg -g vg_rna.spliced.gcsa -d vg_rna.splic
381391

382392
This will produce alignments in the multipath format. For more information on the multipath alignment format and `vg mpmap` see [wiki page on mpmap](https://github.com/vgteam/vg/wiki/Multipath-alignments-and-vg-mpmap). Running the two commands on the small example data using 4 threads should on most machines take less than a minute.
383393

394+
### Alignment
395+
396+
If you have a small graph, you can align a sequence to the whole graph, using a full-length partial order alignment:
397+
398+
<!-- !test check Align a string to a graph -->
399+
```sh
400+
vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG x.vg
401+
```
402+
403+
Note that you don't have to store the graph on disk at all, you can simply pipe it into the local aligner:
404+
405+
<!-- !test check Align a string to a piped graph -->
406+
```sh
407+
vg construct -r small/x.fa -v small/x.vcf.gz | vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG -
408+
```
409+
410+
Most commands allow the streaming of graphs into and out of `vg`.
411+
384412
### Command line interface
385413

386414
A variety of commands are available:

0 commit comments

Comments
 (0)