Skip to content

Commit

Permalink
start adding guide
Browse files Browse the repository at this point in the history
  • Loading branch information
pdimens committed Nov 5, 2024
1 parent 688561e commit 4219228
Showing 1 changed file with 26 additions and 0 deletions.
26 changes: 26 additions & 0 deletions blog/sort_by_barcode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
author: Pavel Dimens
date: 2024-11-05
category: guides
description: Sorting data by linked-read barcode
icon: sync
image: https://visualpharm.com/assets/214/Merge%20Files-595b40b75ba036ed117d8636.svg
---

# :icon-sync: Sort data by barcode
You would think sorting data would be a no-brainer, and in most cases it is.
You can use `seqtk` or `seqkit` to sort FASTQ/A files by their IDs, `samtools` to sort
SAM/BAM/CRAM files by name or coordinates. However, in the world of linked-read
data, sometimes you may need to sort your FASTQ (or BAM) files by the
linked-read barcode. The way to do that wasn't initially obvious to the Harpy/haplotag
team, so this article serves to make this knowledge widely available to linked-read
adopters.

## Sorting Alignments
Let's start with BAM (or SAM/CRAM) files because the process is much simpler.
Since the linked-read barcode is stored in a `BX:Z` tag (or less often as `BC:Z:`),
we can use a little feature of `samtools sort` to guide the sort by the barcode:
> -n TAG Sort first by the value in the alignment tag TAG, then by position or name (if also using -n or -N).
```bash

```

0 comments on commit 4219228

Please sign in to comment.