diff --git a/blog/sort_by_barcode.md b/blog/sort_by_barcode.md new file mode 100644 index 000000000..3f28e51c3 --- /dev/null +++ b/blog/sort_by_barcode.md @@ -0,0 +1,26 @@ +--- +author: Pavel Dimens +date: 2024-11-05 +category: guides +description: Sorting data by linked-read barcode +icon: sync +image: https://visualpharm.com/assets/214/Merge%20Files-595b40b75ba036ed117d8636.svg +--- + +# :icon-sync: Sort data by barcode +You would think sorting data would be a no-brainer, and in most cases it is. +You can use `seqtk` or `seqkit` to sort FASTQ/A files by their IDs, `samtools` to sort +SAM/BAM/CRAM files by name or coordinates. However, in the world of linked-read +data, sometimes you may need to sort your FASTQ (or BAM) files by the +linked-read barcode. The way to do that wasn't initially obvious to the Harpy/haplotag +team, so this article serves to make this knowledge widely available to linked-read +adopters. + +## Sorting Alignments +Let's start with BAM (or SAM/CRAM) files because the process is much simpler. +Since the linked-read barcode is stored in a `BX:Z` tag (or less often as `BC:Z:`), +we can use a little feature of `samtools sort` to guide the sort by the barcode: +> -n TAG Sort first by the value in the alignment tag TAG, then by position or name (if also using -n or -N). +```bash + +``` \ No newline at end of file