Skip to content

Database Schema Extensions

Robert J. Gifford edited this page Oct 17, 2024 · 4 revisions

Flu-GLUE extends GLUE's core schema through the incorporation of additional fields in the sequence table, as well as several custom tables.

Schema extensions for Flu-GLUE are defined in this project build file.

Fields added to 'sequence' table

Parameter Type Definition
full_name VARCHAR Full name of the virus this sequence is derived from
gb_primary_accession VARCHAR Primary GenBank accession number
gb_accession_version VARCHAR GenBank accession number with version
gb_create_date DATE GenBank creation date of the sequence
gb_update_date DATE Date of the most recent GenBank update
length INTEGER Length of the sequence
pubmed_id VARCHAR PubMed ID of the manuscript associated with this sequence
species VARCHAR Virus species
gb_segment INTEGER GenBank segment number
rec_segment VARCHAR Recognized segment, as assigned in Flu-GLUE
rec_subtype VARCHAR Recognized subtype specific to the segment
variation_present BOOLEAN Indicates if variations are present in the sequence

Fields included in 'isolate' table

The isolate table is linked to the main 'sequence' table via the sequence ID field. It contains information related to viral isolates, such as the species sampled, date, and location of sample.

Parameter Type Definition
isolate_id VARCHAR Identifier for the virus isolate
iso_host VARCHAR Scientific name of the host from which the virus was isolated
iso_source VARCHAR Source material from which the isolate was obtained
iso_country VARCHAR Country where the virus was isolated
iso_region VARCHAR Region within the country where the virus was isolated
iso_year INTEGER Year when the virus was isolated
iso_month VARCHAR Month when the virus was isolated
iso_day INTEGER Day when the virus was isolated
lab_host VARCHAR Laboratory host used for propagation
cg_subtype VARCHAR Subtype determined by computational genotyping
gb_subtype VARCHAR Subtype assigned in GenBank
complete_genome BOOLEAN Indicates if the isolate contains a complete genome

Fields included in 'flu_replacement' table

The flu_replacement table contains data on amino acid replacements. Each row represents a specific replacement, along with its biochemical properties.

Parameter Type Definition
display_name VARCHAR Display name of the amino acid replacement
codon_label VARCHAR Codon label (position) where the replacement occurs
codon_label_int INTEGER Codon position as an integer
reference_aa VARCHAR Reference amino acid
reference_nt INTEGER Nucleotide position corresponding to the codon
replacement_aa VARCHAR Amino acid after the replacement
num_seqs INTEGER Number of sequences containing the replacement
radical_hanada_category_i BOOLEAN Hanada classification category I (true if radical)
radical_hanada_category_ii BOOLEAN Hanada classification category II (true if radical)
radical_hanada_category_iii BOOLEAN Hanada classification category III (true if radical)
grantham_distance_double DOUBLE Grantham distance between the reference and replacement amino acids
grantham_distance_int INTEGER Grantham distance as an integer
miyata_distance DOUBLE Miyata biochemical distance between the amino acids
parent_feature VARCHAR Parent feature where the replacement is located
parent_codon_label VARCHAR Codon label of the parent feature

Fields included in 'host' table

The host table stores information about the host organisms from which the influenza viruses were isolated.

Parameter Type Definition
taxon_name VARCHAR Scientific taxon name of the host
taxonomic_level VARCHAR Taxonomic level (e.g., species, genus)
common_name VARCHAR Common name of the host

Links between tables

Flu-GLUE extends the core schema by linking several tables to represent relationships between sequences, isolates, hosts, and amino acid replacements:

Link From Table To Table Type
variation sequence flu_replacement_sequence ONE_TO_MANY
flu_replacement_sequence flu_replacement flu_replacement_sequence ONE_TO_MANY
isolate isolate sequence ONE_TO_MANY
host isolate host ONE_TO_ONE