Merge pull request #3699 from vgteam/giraffe-readme

adamnovak · web-flow · commit 2a029a28753e · 2022-07-08T07:34:33.000-07:00
Add Giraffe to the README
diff --git a/README.md b/README.md
@@ -71,7 +71,7 @@ At present, you will need GCC version 4.9 or greater, with support for C++14, to
 
 Other libraries may be required. Please report any build difficulties.
 
-Note that a 64-bit OS is required. Ubuntu 18.04 should work.
+Note that a 64-bit OS is required. Ubuntu 20.04 should work.
 
 When you are ready, build with `. ./source_me.sh && make`, and run with `./bin/vg`.
 
@@ -189,29 +189,39 @@ Note that `vg` tools can generally read all supported graph formats (VG, uncompr
 
 The format of a given graph file can be retrieved with `vg stats -F`. 
 
-### Alignment
+### Mapping
 
-As this is a small graph, you could align to it using a full-length partial order alignment:
+If you have more than one sequence, or you are working on a large graph, you will want to map rather than merely aligning.
 
-<!-- !test check Align a string to a graph -->
-```sh
-vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG x.vg
-```
+There are multiple read mappers in `vg`:
 
-Note that you don't have to store the graph on disk at all, you can simply pipe it into the local aligner:
+* `vg giraffe` is designed to be fast for highly accurate short reads, against graphs with haplotype information.
+* `vg map` is a general-purpose read mapper.
+* `vg mpmap` does "munti-path" mapping, to allow describing local alignment uncertainty. [This is useful for transcriptomics.](#Transcriptomic-analysis)
 
-<!-- !test check Align a string to a piped graph -->
+#### Mapping with `vg giraffe`
+
+To use `vg giraffe` to map reads, you will first need to prepare indexes. This is best done using `vg autoindex`. In order to get `vg autoindex` to use haplotype information from a VCF file, you can give it the VCF and the associated linear reference directly.
+
+<!-- !test check Simulate and map back with surjection with Giraffe -->
 ```sh
-vg construct -r small/x.fa -v small/x.vcf.gz | vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG -
+# construct the graph and indexes (paths below assume running from `vg/test` directory)
+vg autoindex --workflow giraffe -r small/x.fa -v small/x.vcf.gz -p x
+
+# simulate a bunch of 150bp reads from the graph, into a GAM file of reads aligned to a graph
+vg sim -n 1000 -l 150 -x x.giraffe.gbz -a > x.sim.gam
+# now re-map these reads against the graph, and get BAM output in linear space
+# FASTQ input uses -f instead of -G.
+vg giraffe -Z x.giraffe.gbz -G x.sim.gam -o BAM > aln.bam
 ```
 
-Most commands allow the streaming of graphs into and out of `vg`.
+[More information on using `vg girafe` can be found on the `vg` wiki.](https://github.com/vgteam/vg/wiki/Mapping-short-reads-with-Giraffe)
 
-### Mapping
+#### Mapping with `vg map`
 
 If your graph is large, you want to use `vg index` to store the graph and `vg map` to align reads. `vg map` implements a kmer based seed and extend alignment model that is similar to that used in aligners like novoalign or MOSAIK. First an on-disk index is built with `vg index` which includes the graph itself and kmers of a particular size. When mapping, any kmer size shorter than that used in the index can be employed, and by default the mapper will decrease the kmer size to increase sensitivity when alignment at a particular _k_ fails.
 
-<!-- !test check Simulate and map back with surjection -->
+<!-- !test check Simulate and map back with surjection with map -->
 ```sh
 # construct the graph (paths below assume running from `vg/test` directory)
 vg construct -r small/x.fa -v small/x.vcf.gz > x.vg
@@ -381,6 +391,24 @@ vg mpmap -n rna -t 4 -x vg_rna.spliced.xg -g vg_rna.spliced.gcsa -d vg_rna.splic
 
 This will produce alignments in the multipath format. For more information on the multipath alignment format and `vg mpmap` see [wiki page on mpmap](https://github.com/vgteam/vg/wiki/Multipath-alignments-and-vg-mpmap). Running the two commands on the small example data using 4 threads should on most machines take less than a minute.  
 
+### Alignment
+
+If you have a small graph, you can align a sequence to the whole graph, using a full-length partial order alignment:
+
+<!-- !test check Align a string to a graph -->
+```sh
+vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG x.vg
+```
+
+Note that you don't have to store the graph on disk at all, you can simply pipe it into the local aligner:
+
+<!-- !test check Align a string to a piped graph -->
+```sh
+vg construct -r small/x.fa -v small/x.vcf.gz | vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG -
+```
+
+Most commands allow the streaming of graphs into and out of `vg`.
+
 ### Command line interface
 
 A variety of commands are available: