Analysis of the set of founder sequences under a homologous recombination model.
Please refer to the full paper: 10.4230/LIPIcs.WABI.2022.6.
rust
version >= 1.60gurobi
version >= 9.5 orcplex
python
version >= 3.7snakemake
cargo build --manifest-path Cargo.toml --release
Example with 4 available CPU cores.
cd examples/experiments/sim
snakemake -j 4
cd examples/experiments/examples
snakemake -k -j 4
Use the clean
snakemake target:
snakemake -k -j 1 clean
The following uses the examples
experiment as reference. It demonstrates the
software's typical usage with the provided snakemake
workflows.
Experiments reside in their own respective directories and are configured via a config.yaml
file, used to configure simulation and analysis parameters.
Paths should be left as-is unless changing directory structure.
Remaining recognized parameters:
debug
(boolean): toggles verbose debugging outputxhap_regex
(string): regular expression used to select haplotype paths in the input GFA filessolve_time_limit
(integer, minutes): time limit for thegurobi
optimization stepsnnodes
(integer list): number of nodes in the graphdup_ratio
(list of floats ∈ [0;1]): duplications ratioinv_ratio
(list of floats ∈ [0;1]): inversions among duplications rationhaplotypes
(integer): number of haplotypes to generatensamples
(integer): number of replicates per parameter set
Data used by the experiment should reside in a subdirectory under examples/data
.
hapsim
: generate simulated founder set, haplotypes, and their variation graphsubgr
: select subset of haplotypes and resulting subgraph from a GFA filemkflow
: write to file flow linear program to solveflow2seq
: reconstruct founder set sequences from flow solutionmin_random
: estimate number of recombinations in flow solution by random assignment trialsmkmin
: write to file minimization program to solvemin2seq
: reconstruct founder set sequences from minimization solution
Most relevant output, by file extension:
.gfa
: user-provided GFA, or one generated by the simulator.lp
,.sol
: linear program and solution of flow program and recombination minimization.nrecomb.txt
: number of recombinations in flow solution after random assignment trials.flow.founders.txt
: minimal founder sequences set reconstructed from flow solution.min.founders.txt
: minimal founder sequences set after minimizing their number of recombinations
In the results, founder sequences are represented horizontally in GFA format walk lines. Minimization output shows two additional lines per founder sequence to indicate the positions of a recombination, and the haplotype a segment above belongs to.
This software is distributed under the MIT license. For more details, see the LICENSE file.