fix assembly

pdimens · Nov 6, 2024 · cdcc0f0 · cdcc0f0
1 parent dc21da4
commit cdcc0f0
Show file tree

Hide file tree

Showing 9 changed files with 106 additions and 95 deletions.
diff --git a/Workflows/SV/leviathan.md b/Workflows/SV/leviathan.md
@@ -69,9 +69,9 @@ In addition to the [!badge variant="info" corners="pill" text="common runtime op
 | `--contigs`        |            | file path or list     |         |     | [Contigs to plot](/commonoptions.md#--contigs) in the report                         |
 | `--extra-params` |    `-x`    | string        |         |                | Additional naibr arguments, in quotes              |
 | `--genome`       |    `-g`    | file path     |         |    ‼️ | Genome assembly that was used to create alignments    |
-| `--iterations`   |    `-i`    | integer       |   50    |                 | Number of iterations to perform through index (reduces memory) |
-| `--min-barcodes` |    `-b`    | integer       |    2    |                 | Minimum number of barcode overlaps supporting candidate SV |
-| `--min-sv`       |    `-m`    | integer       |  1000   |                 | Minimum size of SV to detect              |
+| `--iterations`   |    `-i`    | integer       |   `50`    |                 | Number of iterations to perform through index (reduces memory) |
+| `--min-barcodes` |    `-b`    | integer       |    `2`    |                 | Minimum number of barcode overlaps supporting candidate SV |
+| `--min-sv`       |    `-m`    | integer       |  `1000`   |                 | Minimum size of SV to detect              |
 | `--populations`  |    `-p`    | file path     |         |                 | Tab-delimited file of sample\<*tab*\>group         |
 
 ### Single-sample variant calling

diff --git a/Workflows/SV/naibr.md b/Workflows/SV/naibr.md
@@ -69,10 +69,10 @@ In addition to the [!badge variant="info" corners="pill" text="common runtime op
 | `--contigs`        |            | file path or list     |         |     | [Contigs to plot](/commonoptions.md#--contigs) in the report                         |
 | `--extra-params` |    `-x`    | string        |         |                 | Additional naibr arguments, in quotes              |
 | `--genome`       |    `-g`    | file path     |         | ‼️ | Genome assembly for phasing bam files     |
-| `--min-barcodes` |    `-b`    | integer       |    2    |                 | Minimum number of barcode overlaps supporting candidate SV |
-| `--min-quality` |    `-q`    | integer (0-40)        |   30    |        | Minimum `MQ` (SAM mapping quality) to pass filtering  |
-| `--min-sv`       |    `-n`    | integer       |  1000   |                | Minimum size of SV to detect              |
-| `--molecule-distance` |  `-m` | integer       |  100000 |                | Base-pair distance threshold to separate molecules |
+| `--min-barcodes` |    `-b`    | integer       |    `2`    |                 | Minimum number of barcode overlaps supporting candidate SV |
+| `--min-quality` |    `-q`    | integer (0-40)        |   `30`    |        | Minimum `MQ` (SAM mapping quality) to pass filtering  |
+| `--min-sv`       |    `-n`    | integer       |  `1000`   |                | Minimum size of SV to detect              |
+| `--molecule-distance` |  `-m` | integer       |  `100000` |                | Base-pair distance threshold to separate molecules |
 | `--populations`  |    `-p`    | file path     |         |                | Tab-delimited file of sample\<*tab*\>group         |
 | `--vcf`          |    `-v`    | file path     |         | ❗ | Phased vcf file for phasing bam files ([see below](#optional-vcf-file))    |
 

diff --git a/Workflows/Simulate/simulate-linkedreads.md b/Workflows/Simulate/simulate-linkedreads.md
@@ -45,18 +45,18 @@ harpy simulate linkedreads -t 4 -n 2  -l 100 -p 50  data/genome.hap1.fasta data/
 In addition to the [!badge variant="info" corners="pill" text="common runtime options"](/commonoptions.md), the  [!badge corners="pill" text="simulate linkedreads"] module is configured using these command-line arguments:
 
 {.compact}
-| argument       | short name | type        |    default    | required | description                                                                                     |
-|:---------------|:----------:|:------------|:-------------:|:--------:|:------------------------------------------------------------------------------------------------|
-| `HAP1_GENOME`       |            | file path |       | ‼️  | Haplotype 1 of the diploid genome to simulate reads   |
-| `HAP2_GENOME`       |            | file path |       | ‼️  | Haplotype 1 of the diploid genome to simulate reads   |
-| `--barcodes`        |    `-b`    | file path |  [10X barcodes](https://github.com/aquaskyline/LRSIM/blob/master/4M-with-alts-february-2016.txt)   |        | File of linked-read barcodes to add to reads   |
-| `--distance-sd`     |    `-s`    | integer   |  15   |   | Standard deviation of read-pair distance                     |
-| `--molecule-length` |    `-l`    | integer   |  100  |   | Mean molecule length (kbp)                                   |
-| `--molecules-per`   |    `-m`    | integer   |   10  |   | Average number of molecules per partition                    |
-| `--mutation-rate`   |    `-r`    | number    | 0.001 |   | Random mutation rate for simulating reads (0 - 1.0)          |
-| `--outer-distance`  |    `-d`    | integer   | 350   |   | Outer distance between paired-end reads (bp)                 |
-| `--patitions`       |    `-p`    | integer   |  1500 |   | Number (in thousands) of partitions/beads to generate        |
-| `--read-pairs`      |    `-n`    | number    |  600  |   | Number (in millions) of read pairs to simulate               |
+| argument            | short name |                                             default                                             | required | description                                           |
+| :------------------ | :--------: | :---------------------------------------------------------------------------------------------: | :------: | :---------------------------------------------------- |
+| `HAP1_GENOME`       |            |                                                                                                 |   ‼️    | Haplotype 1 of the diploid genome to simulate reads   |
+| `HAP2_GENOME`       |            |                                                                                                 |   ‼️    | Haplotype 1 of the diploid genome to simulate reads   |
+| `--barcodes`        |    `-b`    | [10X barcodes](https://github.com/aquaskyline/LRSIM/blob/master/4M-with-alts-february-2016.txt) |          | File of linked-read barcodes to add to reads          |
+| `--distance-sd`     |    `-s`    |                                              `15`                                               |          | Standard deviation of read-pair distance              |
+| `--molecule-length` |    `-l`    |                                              `100`                                              |          | Mean molecule length (kbp)                            |
+| `--molecules-per`   |    `-m`    |                                              `10`                                               |          | Average number of molecules per partition             |
+| `--mutation-rate`   |    `-r`    |                                             `0.001`                                             |          | Random mutation rate for simulating reads (0 - 1.0)   |
+| `--outer-distance`  |    `-d`    |                                              `350`                                              |          | Outer distance between paired-end reads (bp)          |
+| `--patitions`       |    `-p`    |                                             `1500`                                              |          | Number (in thousands) of partitions/beads to generate |
+| `--read-pairs`      |    `-n`    |                                              `600`                                              |          | Number (in millions) of read pairs to simulate        |
 
 ## Mutation Rate
 The read simulation is two-part: first `dwgsim` generates forward and reverse FASTQ files from the provided genome haplotypes

diff --git a/Workflows/Simulate/simulate-variants.md b/Workflows/Simulate/simulate-variants.md
@@ -32,12 +32,12 @@ harpy simulate inversion -n 10 --min-size 1000 --max-size 50000  path/to/genome.
 There are 4 submodules with very obvious names:
 
 {.compact}
-| submodule | what it does |
-|:----------|:-------------|
-| [!badge corners="pill" text="snpindel"](#snpindel) | simulates single nucleotide polymorphisms (snps) and insertion-deletions (indels) |
-| [!badge corners="pill" text="inversion"](#inversion) | simulates inversions |
-| [!badge corners="pill" text="cnv"](#cnv) | simulates copy number variants |
-| [!badge corners="pill" text="translocation"](#translocation) | simulates translocations |
+| submodule                                                    | what it does                                                                      |
+| :----------------------------------------------------------- | :-------------------------------------------------------------------------------- |
+| [!badge corners="pill" text="snpindel"](#snpindel)           | simulates single nucleotide polymorphisms (snps) and insertion-deletions (indels) |
+| [!badge corners="pill" text="inversion"](#inversion)         | simulates inversions                                                              |
+| [!badge corners="pill" text="cnv"](#cnv)                     | simulates copy number variants                                                    |
+| [!badge corners="pill" text="translocation"](#translocation) | simulates translocations                                                          |
 
 ## :icon-terminal: Running Options
 While there are serveral differences between individual workflow options, each has available all the
@@ -46,16 +46,16 @@ Each requires and input genome at the end of the command line, and each requires
 to randomly simulate, or a `--vcf` of specific variants to simulate. There are also these unifying options among the different variant types:
 
 {.compact}
-| argument | short name | type |  required | description |
-| :-----|:-----|:-----|:---:|:-----|
-| `INPUT_GENOME`           |    | file path  | ‼️ |  The haploid genome to simulate variants onto|
-| `--centromeres` | `-c` | file path |  | GFF3 file of centromeres to avoid |
-| `--exclude-chr` | `-e` | file path |  | Text file of chromosomes to avoid, one per line |
-| `--genes` | `-g` | file path |   | GFF3 file of genes to avoid simulating over (see `snpindel` for caveat) |
-| `--heterozygosity` | `-z` | float between [0,1] |  | [proportion of simulated variants to make heterozygous   ](#heterozygosity) (default: `0`) |
-| `--only-vcf` | | toggle | | When used with `--heterozygosity`, will create the diploid VCFs but will not simulate a diploid genome |
-| `--prefix` | | string |   | Naming prefix for output files (default: `sim.{module_name}`)|
-| `--randomseed` |  | integer |   | Random seed for simulation |
+| argument           | short name | required | description                                                                                            |
+| :----------------- | :--------- | :------: | :----------------------------------------------------------------------------------------------------- |
+| `INPUT_GENOME`     |            |   ‼️      | The haploid genome to simulate variants onto                                                           |
+| `--centromeres`    | `-c`       |          | GFF3 file of centromeres to avoid                                                                      |
+| `--exclude-chr`    | `-e`       |          | Text file of chromosomes to avoid, one per line                                                        |
+| `--genes`          | `-g`       |          | GFF3 file of genes to avoid simulating over (see `snpindel` for caveat)                                |
+| `--heterozygosity` | `-z`       |          | [proportion of simulated variants to make heterozygous   ](#heterozygosity) (default: `0`)             |
+| `--only-vcf`       |            |          | When used with `--heterozygosity`, will create the diploid VCFs but will not simulate a diploid genome |
+| `--prefix`         |            |          | Naming prefix for output files (default: `sim.{module_name}`)                                          |
+| `--randomseed`     |            |          | Random seed for simulation                                                                             |
 
 !!!warning simulations can be slow
 Given software limitations, simulating many variants **relative to the size of the input genome** will be noticeably slow.
@@ -69,72 +69,72 @@ An indel, is a type of mutation that involves the addition/deletion of one or mo
 The snp and indel variants are combined in this module because `simuG` allows simulating them together. 
 
 {.compact}
-| argument          | short name | type       | default |  description                                                 |
-|:------------------|:----------:|:-----------|:-------:|:-------------------------------------------------------------|
-| `--indel-count` |  `-m` | integer | 0 | Number of random indels to simluate |
-| `--indel-vcf` | `-i` | file path | | VCF file of known indels to simulate |
-| `--indel-ratio` | `-d` | float  |  1 | Insertion/Deletion ratio for indels |
-| `--indel-size-alpha` | `-a` | float |  2.0 | Exponent Alpha for power-law-fitted indel size distribution|
-| `--indel-size-constant` | `-l` | float | 0.5 | Exponent constant for power-law-fitted indel size distribution |
-| `--snp-count` | `-n` | integer | 0 | Number of random snps to simluate |
-| `--snp-gene-constraints` | `-y` | string | | How to constrain randomly simulated SNPs {`noncoding`,`coding`,`2d`,`4d`} when using `--genes`|
-| `--snp-vcf`| `-s` | file path | | VCF file of known snps to simulate |
-| `--titv-ratio` | `-r` | float  | 0.5 | Transition/Transversion ratio for snps |
+| argument                 | short name | default | description                                                                                    |
+| :----------------------- | :--------: | :-----: | :--------------------------------------------------------------------------------------------- |
+| `--indel-count`          |    `-m`    |   `0`   | Number of random indels to simluate                                                            |
+| `--indel-vcf`            |    `-i`    |         | VCF file of known indels to simulate                                                           |
+| `--indel-ratio`          |    `-d`    |   `1`   | Insertion/Deletion ratio for indels                                                            |
+| `--indel-size-alpha`     |    `-a`    |  `2.0`  | Exponent Alpha for power-law-fitted indel size distribution                                    |
+| `--indel-size-constant`  |    `-l`    |  `0.5`  | Exponent constant for power-law-fitted indel size distribution                                 |
+| `--snp-count`            |    `-n`    |   `0`   | Number of random snps to simluate                                                              |
+| `--snp-gene-constraints` |    `-y`    |         | How to constrain randomly simulated SNPs {`noncoding`,`coding`,`2d`,`4d`} when using `--genes` |
+| `--snp-vcf`              |    `-s`    |         | VCF file of known snps to simulate                                                             |
+| `--titv-ratio`           |    `-r`    |  `0.5`  | Transition/Transversion ratio for snps                                                         |
 
 The ratio parameters for snp and indel variants and have special meanings when setting
 the value to either `0` or `9999` :
 
 {.compact}
-| ratio | `0` meaning | `9999` meaning   |
-|:---- |:---|:---|
-| `--indel-ratio` | deletions only | insertions only |
-| `--titv-ratio` | transversions only | transitions  only |
+| ratio           | `0` meaning        | `9999` meaning    |
+| :-------------- | :----------------- | :---------------- |
+| `--indel-ratio` | deletions only     | insertions only   |
+| `--titv-ratio`  | transversions only | transitions  only |
 
 +++ 🔵 inversions
 ### inversion
 Inversions are when a section of a chromosome appears in the reverse orientation ([source](https://www.genome.gov/genetics-glossary/Inversion)).
 
 {.compact}
-| argument          | short name | type       | default |  description     |
-|:------------------|:----------:|:-----------|:-------:|:----------------|
-| `--count`| `-n` | integer | 0 |  Number of random inversions to simluate |
-| `--max-size` | `-x` | integer | 100000 | Maximum inversion size (bp) |
-| `--min-size` | `-m` | integer | 1000 | Minimum inversion size (bp) |
-| `--vcf` | `-v` | file path |  |  VCF file of known inversions to simulate |
+| argument     | short name | default  | description                              |
+| :----------- | :--------: | :------: | :--------------------------------------- |
+| `--count`    |    `-n`    |   `0`    | Number of random inversions to simluate  |
+| `--max-size` |    `-x`    | `100000` | Maximum inversion size (bp)              |
+| `--min-size` |    `-m`    |  `1000`  | Minimum inversion size (bp)              |
+| `--vcf`      |    `-v`    |          | VCF file of known inversions to simulate |
 
 +++ 🟢 copy number variants
 ### cnv
 A copy number variation (CNV) is when the number of copies of a particular gene varies
 between individuals ([source](https://www.genome.gov/genetics-glossary/Copy-Number-Variation)).
 
 {.compact}
-| argument          | short name | type       | default |  description     |
-|:------------------|:----------:|:-----------|:-------:|:----------------|
-| `--vcf` | `-v` | file path | | VCF file of known copy number variants to simulate |
-| `--count` | `-n` | integer | 0 | Number of random cnv to simluate |
-| `--dup-ratio` | `-d` | float |  1 | Tandem/Dispersed duplication ratio |
-| `--gain-ratio` |`-l` | float |  1 | Relative ratio of DNA gain over DNA loss |
-| `--max-size`|   `-x` | integer |100000 | Maximum cnv size (bp) |
-| `--max-copy` |  `-y` | integer | 10 | Maximum number of copies |
-| `--min-size` | `-m` | integer |  1000 | Minimum cnv size (bp) |
+| argument       | short name | default  | description                                        |
+| :------------- | :--------: | :------: | :------------------------------------------------- |
+| `--vcf`        |    `-v`    |          | VCF file of known copy number variants to simulate |
+| `--count`      |    `-n`    |   `0`    | Number of random cnv to simluate                   |
+| `--dup-ratio`  |    `-d`    |   `1`    | Tandem/Dispersed duplication ratio                 |
+| `--gain-ratio` |    `-l`    |   `1`    | Relative ratio of DNA gain over DNA loss           |
+| `--max-size`   |    `-x`    | `100000` | Maximum cnv size (bp)                              |
+| `--max-copy`   |    `-y`    |   `10`   | Maximum number of copies                           |
+| `--min-size`   |    `-m`    |  `1000`  | Minimum cnv size (bp)                              |
 
 The ratio parameters have special meanings when setting the value to either `0` or `9999` :
 
 {.compact}
-| ratio | `0` meaning | `9999` meaning   |
-|:---- |:---|:---|
-| `--dup-ratio` | dispersed duplications only | tandem duplications only |
-| `--gain-ratio` | loss only | gain only |
+| ratio          | `0` meaning                 | `9999` meaning           |
+| :------------- | :-------------------------- | :----------------------- |
+| `--dup-ratio`  | dispersed duplications only | tandem duplications only |
+| `--gain-ratio` | loss only                   | gain only                |
 
 +++ 🟡 translocations
 ### translocation
 A translocation occurs when a chromosome breaks and the fragmented pieces re-attach to different chromosomes ([source](https://www.genome.gov/genetics-glossary/Translocation)). 
 
 {.compact}
-| argument          | short name | type       | default |  description     |
-|:------------------|:----------:|:-----------|:-------:|:----------------|
-| `--count`| `-n` | integer | 0 |  Number of random inversions to simluate |
-| `--vcf` | `-v` | file path |  |  VCF file of known inversions to simulate |
+| argument  | short name | default | description                              |
+| :-------- | :--------: | :-----: | :--------------------------------------- |
+| `--count` |    `-n`    |   `0`   | Number of random inversions to simluate  |
+| `--vcf`   |    `-v`    |         | VCF file of known inversions to simulate |
 
 +++