-
Notifications
You must be signed in to change notification settings - Fork 0
Paleovirus Schema Extensions
The paleovirus component of Flavivirid-GLUE extends GLUE's core schema to allow the capture of EFV-specific data.
These schema extensions are defined in this file and comprise two additional tables: 'locus_data' and 'refcon_data'.
The 'locus_data' table contains EFV locus information: e.g. species, assembly, scaffold, location coordinates.
The 'refcon_data' table contains summary information for individual EFV insertions. It refers to the reference sequences constructed to represent each insertion, which reflect our best efforts to reconstruct progenitor virus sequences as they might have looked when they initially integrated into the germline of ancestral species.
Both these custom tables are linked to the main 'sequence' table via the 'sequenceID' field.
Thus, the sequence table of GLUE's core schema was extended to include the following additional fields:
Parameter | Type | Definition |
---|---|---|
refcon_data | LINK | Link to the refcon_data table containing consensus/reference sequence data |
locus_data | LINK | Link to the locus_data table containing locus-specific information |
The project-specific extensions comprise two custom tables:
-
locus_data: contains EVE locus information: e.g. species, assembly, scaffold, location coordinates.
-
refcon_data: contains summary information for individual EVE insertions. It refers to the reference sequences constructed to represent each insertion, which reflect our best efforts to reconstruct progenitor virus sequences as they might have looked when they initially integrated into the germline of ancestral species.
A custom table was defined to capture reference/consensus sequence-associated information, as follows:
Parameter | Type | Definition |
---|---|---|
reftype | VARCHAR | Type of reference (e.g., consensus or reference sequence) |
host_group_taxlevel | VARCHAR | Taxonomic level of the host group (e.g., genus, species) |
host_group_name | VARCHAR | Scientific name of the host group |
host_group_common_name | VARCHAR | Common name of the host group |
num_copies | INTEGER | Number of endogenous viral element copies |
locus_id | VARCHAR | Identifier for the corresponding locus |
A custom table was defined to capture information associated with individual EVE loci, as follows:
Parameter | Type | Definition |
---|---|---|
locus_id | VARCHAR | Identifier for the EVE locus |
duplicate_id | INTEGER | Identifier for duplicates within the same locus |
organism | VARCHAR | Host organism containing the locus |
scaffold | VARCHAR | Scaffold or chromosome on which the locus resides |
start | INTEGER | Start position of the locus on the scaffold |
end | INTEGER | End position of the locus on the scaffold |
orientation | VARCHAR | Orientation of the locus (plus or minus strand) |
length | INTEGER | Length of the locus in nucleotides |