From 44247a9824070504f5f53274f0e72675831be20d Mon Sep 17 00:00:00 2001 From: pdimens Date: Wed, 11 Dec 2024 13:16:04 -0500 Subject: [PATCH] update to 1.14.1 --- Workflows/other.md | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/Workflows/other.md b/Workflows/other.md index 5b9c746b..a25528fa 100644 --- a/Workflows/other.md +++ b/Workflows/other.md @@ -1,15 +1,13 @@ --- label: Other -icon: file-diff +icon: ellipsis description: Generate extra files for analysis with Harpy order: 7 --- -# :icon-file-diff: Other Harpy modules -On this page you'll find Harpy functions that aren't standalone workflows. These may create ancillary inputs, continue where you left off, -view important workflow files, etc. +# :icon-ellipsis: Other Harpy modules +On this page you'll find Harpy functions that do other, ancillary things. -## :icon-terminal: Other modules {.compact} | module | description | | :------------- | :------------------------------------------------------------------------------- | @@ -24,15 +22,26 @@ view important workflow files, etc. ### downsample While downsampling (subsampling) FASTQ and BAM files is relatively simple with tools such as `awk`, `samtools`, `seqtk`, `seqkit`, etc., Harpy offers the `downsample` module, which allows you to downsample a BAM file (or paired-end FASTQ) _by barcodes_. That means you can -keep all the reads associated with `d` number of barcodes. +keep all the reads associated with `d` number of barcodes. First, barcodes are extracted, then subsampled, then the reads associated +with those barcodes are extracted. The `--invalid` proportion will determine what proportion of invalid barcodes appear in the barcode +pool that gets subsampled, where `0` is none, `1` is all invalid barcodes, and a number in between is that proportion, e.g. `0.5` is half. +Bear in mind that the barcode pool still gets subsampled, so the `--invalid` proportion doesn't necessarily reflect how many end up getting +sampled, rather what proportion will be considered for sampling. + +!!! Barcode tag +Barcodes must be in the `BX:Z` SAM tag for both BAM and FASTQ inputs. See [Section 1 of the SAM Spec here](https://samtools.github.io/hts-specs/SAMtags.pdf). +!!! ```bash usage -# a BAM file harpy downsample OPTIONS... INPUT(S)... ``` ```bash example -harpy downsample -d 1000 -i drop -b BC -p sample1.sub1000 +# BAM file +harpy downsample -d 1000 -i 0.3 -p sample1.sub1000 sample1.bam + +# FASTQ file +harpy downsample -d 1000 -i 0 -p sample1.sub1000 sample1.F.fq.gz sample1.R.fq.gz ``` #### arguments @@ -41,10 +50,10 @@ harpy downsample -d 1000 -i drop -b BC -p sample1.sub1000 | :-------------- | :--------: | :-----------: | :-------------------------------------------------------------------------------------------------------------------------------- | | `INPUT(S)` | | | [!badge variant="info" text="required"] One BAM file or both read files from a paired-end FASTQ pair | | `--downsample` | `-d` | | [!badge variant="info" text="required"] Number of barcodes to downsample to | -| `--invalid` | `-i` | `keep` | Strategy to handle invalid/missing barcodes [`keep`,`drop`] | -| `--bx-tag` | `-b` | `BX` | The header tag with the barcode [!badge variant="secondary" text="alphanumeric"] [!badge variant="secondary" text="2 characters"] | +| `--invalid` | `-i` | 1 | Proportion of barcodes to sample | | `--prefix` | `-p` | `downsampled` | Prefix for output files | | `--random-seed` | | | Random seed for sampling [!badge variant="secondary" text="optional"] | +| `--snakemake` | | | Additional Snakemake arguments, in quotes | | `--threads` | `-t` | `4` | Number of threads to use | | `--quiet` | | | Don't show output text while running |