-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
26 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
author: Pavel Dimens | ||
date: 2024-11-05 | ||
category: guides | ||
description: Sorting data by linked-read barcode | ||
icon: sync | ||
image: https://visualpharm.com/assets/214/Merge%20Files-595b40b75ba036ed117d8636.svg | ||
--- | ||
|
||
# :icon-sync: Sort data by barcode | ||
You would think sorting data would be a no-brainer, and in most cases it is. | ||
You can use `seqtk` or `seqkit` to sort FASTQ/A files by their IDs, `samtools` to sort | ||
SAM/BAM/CRAM files by name or coordinates. However, in the world of linked-read | ||
data, sometimes you may need to sort your FASTQ (or BAM) files by the | ||
linked-read barcode. The way to do that wasn't initially obvious to the Harpy/haplotag | ||
team, so this article serves to make this knowledge widely available to linked-read | ||
adopters. | ||
|
||
## Sorting Alignments | ||
Let's start with BAM (or SAM/CRAM) files because the process is much simpler. | ||
Since the linked-read barcode is stored in a `BX:Z` tag (or less often as `BC:Z:`), | ||
we can use a little feature of `samtools sort` to guide the sort by the barcode: | ||
> -n TAG Sort first by the value in the alignment tag TAG, then by position or name (if also using -n or -N). | ||
```bash | ||
|
||
``` |