Constructs repeat-free/semi-repeat-free non-elastic/elastic founder graphs from multiple sequence alignments.
Clone this repository with dependencies:
$ git clone --recurse-submodules https://github.com/algbio/founderblockgraphs.git
$ cd founderblockgraphs
Build sdsl-lite-v3:
$ cd sdsl-lite-v3
$ ./install.sh .
$ cd ..
Build this project (founderblockgraph
, locate_multiple
, locate_patterns
):
$ make
Usage: founderblockgraph --input=MSA.fasta --output={MSA.index|efg.xgfa} [--gfa]
[--elastic] [--gap-limit=GAPLIMIT] [--threads=THREADNUM]
[--graphviz-output=efg.dot] [--output-paths] [--ignore-chars="ALPHABET"]
Constructs a semi-repeat-free (Elastic) Founder Graph
Input is MSA given in fasta format. In standard mode (without --elastic), rows
with runs of gaps ‘-’ or N’s ≥ GAPLIMIT will be filtered out.
-h, --help Print help and exit
--full-help Print help, including hidden options, and exit
-V, --version Print version and exit
--input=filename MSA input path
--output=filename Index/EFG output path
--gap-limit=GAPLIMIT Gap limit (suppressed by --elastic)
(default=`1')
--graphviz-output=filename
Graphviz output path
--memory-chart-output=filename
Memory chart output path
-e, --elastic Min-max-length semi-repeat-free segmentation
(default=off)
--gfa Saves output in xGFA format (default=off)
-p, --output-paths Print the original sequences as paths of the
xGFA graph (requires --gfa) (default=off)
--ignore-chars=STRING Ignore these characters for the indexability
property/pattern matching
-t, --threads=THREADNUM Max # threads (default=`-1')
- document EFG tricks related to option
--ignore-chars
, to the start and end of sequences, and to initial and ending runs of gaps - implement validation of .gfa files
- implement pattern matching (
locate_multiple
,locate_patterns
) on EFGs - implement min max height segmentation