Skip to content

Latest commit

 

History

History
35 lines (23 loc) · 1.96 KB

File metadata and controls

35 lines (23 loc) · 1.96 KB

Read Groups

A read group (@RG) is a unique identifier that group reads together, capturing relevant information about the sample and the sequencing process and technology, utilized by various downstream bioinformatics tools.

The relevant fields in defining a read group include:

  • ID (Identifier): A unique identifier for the read group within the BAM file and across multiple BAM files used in the same dataset.
  • SM (Sample): The sample to which the reads belong.
  • PL (Platform): The technology used to sequence the reads (e.g., ONT).
  • PM (Platform Model): The platform model reflecting the instrument series.
  • PU (Platform Unit): A unique identifier for the sequencer unit used for sequencing.
  • LB (Library): The library used to sequence the reads.
  • DS (Description): Semantic information about the reads in the group, encoded as a semicolon-delimited list of “Key=Value” strings.
  • DT (Date/Time): The date and time when the run was produced (ISO8601 date or date/time).
  • basecall_model: The model used for base calling.

Assigning Read Groups

The original read groups from the unaligned BAM files are linked and maintained in the corresponding alignment BAM files. In-house bash code that utilizes samtools replaces SM and LB information with the correct identifiers used by the portal, as follows:

  • SM: <sample name>
  • LB: <sample name>.<library>

E.g., in BAM file:

@RG	ID:bcdb4058-3545-4c45-aea9-4159f1c2ca7d_dna_r10.4.1_e8.2_400bps_sup@v4.2.0	DT:2024-02-21T12:56:53.022625-06:00	DS:runid=bcdb4058-3545-4c45-aea9-4159f1c2ca7d	basecall_model=dna_r10.4.1_e8.2_400bps_sup@v4.2.0	LB:SMACUWVOKOZU.SMALI56YAYM5	PL:ONT	PM:3A	PU:PAW14872	al:unclassified SM:SMACUWVOKOZU

Source Code

All the relevant code is accessible in the GitHub repository: