-
Notifications
You must be signed in to change notification settings - Fork 0
Primate Project Data
-
Full Genome Reference Sequences:
A curated set of full genome reference sequences, each with associated metadata. -
Alignment of Reference Sequences:
An alignment of the full genome reference sequences, providing a standardized framework for comparative analysis. -
Genome Feature Definitions:
A set of defined genome features relevant to HIV-1. -
Coordinate Mapping:
A list of coordinates linking genome features to specific positions within at least one reference genome. -
Phylogenetic 'Alignment Tree':
A tree that defines phylogenetic relationships between different HIV-1 clades.
We are assessing the LANL sequence selections to determine if all sequences are necessary for our project:
- Metadata Considerations: Reviewing the GenBank metadata to verify sequence reliability.
- Patent Sequence: One CRF sequence is labeled as a patent sequence. We need to evaluate its reliability.
- Rare Subtypes: Subtypes K and H are rare and may not be sampled again. Including them in a minimal project may not be beneficial. This warrants further discussion.
An initial alignment of reference sequences was generated using MUSCLE (Edgar), then manually adjusted in [specify alignment viewing software].
Manual adjustments included:
- In-Frame Coding: Ensuring that all coding genes remain in-frame by removing frameshifting indels.
- Codon Grouping: Grouping nucleotides into codons wherever possible, such that sets of three nucleotides that had been split by the alignment software were restored to form single codons.
GLUE requires genome features to be mapped to specific coordinates on at least one reference sequence. We annotated genome features on the following references:
- HXB2 (Subtype B): The primary reference used in most epidemiological and clinical studies.
- NL43 (Subtype B): Commonly used in laboratory studies, providing a practical alternative reference.
- Subtype C Reference: Given the prevalence of subtype C in global HIV-1 infections, particularly in sub-Saharan Africa where the subtype contributes significantly to morbidity and mortality, this reference is essential. Many transmission pair sequences are also subtype C, further justifying its inclusion.
Having multiple annotated references, particularly the option to choose between HXB2, NL43, and a subtype C reference, provides flexibility and a substantial advantage in various study contexts.