Skip to content

Commit 94ce116

Browse files
committed
make own page
1 parent 44247a9 commit 94ce116

File tree

2 files changed

+69
-41
lines changed

2 files changed

+69
-41
lines changed

Workflows/downsample.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
label: Downsample
3+
description: Downsample data by barcode
4+
icon: fold-down
5+
order: 10
6+
---
7+
8+
# :icon-fold-down: Downsample data by barcode
9+
10+
=== :icon-checklist: You will need one of either
11+
- one alignment file [!badge variant="success" text=".bam"] [!badge variant="success" text=".sam"] [!badge variant="secondary" text="case insensitive"]
12+
- one set of paired-end reads in FASTQ format [!badge variant="success" text=".fq"] [!badge variant="success" text=".fastq"] [!badge variant="secondary" text="gzip recommended"] [!badge variant="secondary" text="case insensitive"]
13+
===
14+
15+
While downsampling (subsampling) FASTQ and BAM files is relatively simple with tools such as `awk`, `samtools`, `seqtk`, `seqkit`, etc.,
16+
[!badge corners="pill" text="downsample"] allows you to downsample a BAM file (or paired-end FASTQ) _by barcodes_. That means you can
17+
keep all the reads associated with `d` number of barcodes. The `--invalid` proportion will determine what proportion of invalid barcodes appear in the barcode
18+
pool that gets subsampled, where `0` is none, `1` is all invalid barcodes, and a number in between is that proportion, e.g. `0.5` is half.
19+
Bear in mind that the barcode pool still gets subsampled, so the `--invalid` proportion doesn't necessarily reflect how many end up getting
20+
sampled, rather what proportion will be considered for sampling.
21+
22+
!!! Barcode tag
23+
Barcodes must be in the `BX:Z` SAM tag for both BAM and FASTQ inputs. See [Section 1 of the SAM Spec here](https://samtools.github.io/hts-specs/SAMtags.pdf).
24+
!!!
25+
26+
```bash usage
27+
harpy downsample OPTIONS... INPUT(S)...
28+
```
29+
30+
```bash example
31+
# BAM file
32+
harpy downsample -d 1000 -i 0.3 -p sample1.sub1000 sample1.bam
33+
34+
# FASTQ file
35+
harpy downsample -d 1000 -i 0 -p sample1.sub1000 sample1.F.fq.gz sample1.R.fq.gz
36+
```
37+
38+
## :icon-terminal: Running Options
39+
In addition to the [!badge variant="info" corners="pill" text="common runtime options"](/commonoptions.md), the [!badge corners="pill" text="downsample"]
40+
module is configured using the command-line arguments below.
41+
42+
{.compact}
43+
| argument | short name | default | description |
44+
| :-------------- | :--------: | :-----------: | :-------------------------------------------------------------------------------------------------------------------------------- |
45+
| `INPUT(S)` | | | [!badge variant="info" text="required"] One BAM file or both read files from a paired-end FASTQ pair |
46+
| `--downsample` | `-d` | | [!badge variant="info" text="required"] Number of barcodes to downsample to |
47+
| `--invalid` | `-i` | `1` | Proportion of barcodes to sample |
48+
| `--prefix` | `-p` | `downsampled` | Prefix for output files |
49+
| `--random-seed` | | | Random seed for sampling [!badge variant="secondary" text="optional"] |
50+
51+
----
52+
## :icon-git-pull-request: Downsample Workflow
53+
```mermaid
54+
graph LR
55+
subgraph fastq
56+
R1([read 1]):::clean---R2([read 2]):::clean
57+
end
58+
subgraph bam
59+
bamfile([bam]):::clean
60+
end
61+
fastq-->|bam conversion|bam
62+
bam-->sub([extract and\n subsample barcodes]):::clean
63+
sub-->exreads([extract reads]):::clean
64+
bam-->exreads
65+
fastq-->exreads
66+
style fastq fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
67+
style bam fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
68+
classDef clean fill:#f5f6f9,stroke:#b7c9ef,stroke-width:2px
69+
```

Workflows/other.md

Lines changed: 0 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -11,54 +11,13 @@ On this page you'll find Harpy functions that do other, ancillary things.
1111
{.compact}
1212
| module | description |
1313
| :------------- | :------------------------------------------------------------------------------- |
14-
| `downsample` | Downsample BAM or FASTQ files by barcode |
1514
| `imputeparams` | Create a template imputation parameter file |
1615
| `resume` | Continue a Harpy workflow from an existing output folder |
1716
| `popgroup` | Create generic sample-group file using existing sample file names (fq.gz or bam) |
1817
| `view` | View a workflow log, config, or snakefile |
1918

2019
---
2120

22-
### downsample
23-
While downsampling (subsampling) FASTQ and BAM files is relatively simple with tools such as `awk`, `samtools`, `seqtk`, `seqkit`, etc.,
24-
Harpy offers the `downsample` module, which allows you to downsample a BAM file (or paired-end FASTQ) _by barcodes_. That means you can
25-
keep all the reads associated with `d` number of barcodes. First, barcodes are extracted, then subsampled, then the reads associated
26-
with those barcodes are extracted. The `--invalid` proportion will determine what proportion of invalid barcodes appear in the barcode
27-
pool that gets subsampled, where `0` is none, `1` is all invalid barcodes, and a number in between is that proportion, e.g. `0.5` is half.
28-
Bear in mind that the barcode pool still gets subsampled, so the `--invalid` proportion doesn't necessarily reflect how many end up getting
29-
sampled, rather what proportion will be considered for sampling.
30-
31-
!!! Barcode tag
32-
Barcodes must be in the `BX:Z` SAM tag for both BAM and FASTQ inputs. See [Section 1 of the SAM Spec here](https://samtools.github.io/hts-specs/SAMtags.pdf).
33-
!!!
34-
35-
```bash usage
36-
harpy downsample OPTIONS... INPUT(S)...
37-
```
38-
39-
```bash example
40-
# BAM file
41-
harpy downsample -d 1000 -i 0.3 -p sample1.sub1000 sample1.bam
42-
43-
# FASTQ file
44-
harpy downsample -d 1000 -i 0 -p sample1.sub1000 sample1.F.fq.gz sample1.R.fq.gz
45-
```
46-
47-
#### arguments
48-
{.compact}
49-
| argument | short name | default | description |
50-
| :-------------- | :--------: | :-----------: | :-------------------------------------------------------------------------------------------------------------------------------- |
51-
| `INPUT(S)` | | | [!badge variant="info" text="required"] One BAM file or both read files from a paired-end FASTQ pair |
52-
| `--downsample` | `-d` | | [!badge variant="info" text="required"] Number of barcodes to downsample to |
53-
| `--invalid` | `-i` | 1 | Proportion of barcodes to sample |
54-
| `--prefix` | `-p` | `downsampled` | Prefix for output files |
55-
| `--random-seed` | | | Random seed for sampling [!badge variant="secondary" text="optional"] |
56-
| `--snakemake` | | | Additional Snakemake arguments, in quotes |
57-
| `--threads` | `-t` | `4` | Number of threads to use |
58-
| `--quiet` | | | Don't show output text while running |
59-
60-
---
61-
6221
### imputeparams
6322
Create a template parameter file for the [!badge corners="pill" text="impute"](/Workflows/impute.md) module.
6423
The file is formatted correctly and serves as a starting point for using parameters that make sense for your study.

0 commit comments

Comments
 (0)