Find Annotation and Knowledge Graphs to integrate #41

josiahseaman · 2020-04-20T16:30:28Z

Assignee: Ali Haider Bangash
The first step is to identify what data could be integrated through a knowledge graph and what is available. What did the other Hackathon teams accomplish? What is available? Information goes in this issue. We're looking for information that relates to genetic variants of the virus:

Structural annotations channel. Protein structure => codon table => sequence position. We could mark up pangenome positions related to known protein variants
Gene Annotations: Possibly only need the reference gene annotation GFF, but it would be nice to have these positions in the graph genome context. @subwaystation has ensured we have coordinate transforms that go both ways pangenome <-> reference genome coordinates, using faldo in RDF.
Clinical data: Possibly the most important. If we have any knowledge of patient outcomes, and what region they're from, we could connect a strain of the virus (which will contain variants) to a patient outcome: how long in hospital, how long on ventilator, etc. We don't necessarily need a viral sequence from that specific individual, but at minimum a probable association with a variant.
- Human DNA variation data could also be used as in UK Biobank article.
- Technically, annotated a complete human pangenome is beyond our current scope in that gigabase genomes will put strain on our pipeline. It may be possible, however to make local graphs of key regions like HLA or MHC inside the Human genome.
Phylogenetics: We're going to have a phylogenetic tree eventually Phylogenetic Tree Visualization Schematize#58. It'd be nice to link this with the "country" and "town" concepts in the knowledge graph. What geographic or transmission data could we bring in?

hhaider15 · 2020-04-21T12:13:50Z

Clinical data
South Korea's CoVid 2019 patients 5 Year patient history The government of the Republic of Korea decided to share the world’s first de-identified COVID-19 nationwide patient data with domestic and international researchers. The data sets are collected and processed promptly, thanks to the Korean National Health Insurance System, covering the entire population across the nation.

hhaider15 · 2020-04-21T12:21:38Z

Structural annotations: Very well done by Machine learning working group- Complete genomes of the strains: labelled with the respective source & its metadata

hhaider15 · 2020-04-21T12:25:32Z

Gene annotations: whole genome nucleotide data pulled from RVDB release 14 as labels. Metadata for human & non-human pathogen phenotypes

hhaider15 · 2020-04-21T12:30:24Z

Structure annotations: Amino acid sequence data for common cold CoV and SARS-COV-2 for M, E & S proteins with metadata

hhaider15 · 2020-04-21T12:43:35Z

Genes & structural annotations: Proteomics data & MassIVE/CCMS Maestro+MSstats reanalysis of MSV000085096 / PXD017710 Proteome and Translatome of SARS-CoV-2 infected cells

subwaystation · 2020-04-21T14:48:36Z

Hi @hhaider15 !
Thanks for all the links. We could work with e.g. .csv or .fasta.

But what we had in mind are SparqlEndpoints which we could query using SPARQL.

I think a good start would be http://yummydata.org/. And maybe you will finde some endpoints which are not listed there ;)
Please come back to me, if you have more questions.

subwaystation · 2020-04-21T14:52:17Z

@josiahseaman and Phylogenetics: As far as I got it from the #public_sequence_resource group, they will pack the metadata also into a SPARQL endpoint. Part of the metadata will be a mandatory field for collection_location. For the list of the required metadata please visit https://github.com/arvados/bh20-seq-resource/blob/master/example/minimal_example.yaml.

innamoratika · 2020-04-22T19:49:59Z

Ali- Just wanted to introduce myself post-convo with @josiahseaman : I'll be working on the phylo side of things and we should touch base at some point regarding using universal IDs for genomes. We should have enough in the phylo tree that we can track provenance and pass that on to you!

hhaider15 · 2020-05-01T01:52:02Z

Agreed. Apologies I was busy earlier. Shall be working on this, now.

hhaider15 · 2020-05-01T01:52:52Z

Good to see you @innamoratika

josiahseaman added documentation Improvements or additions to documentation good first issue Good for newcomers question Further information is requested labels Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find Annotation and Knowledge Graphs to integrate #41

Find Annotation and Knowledge Graphs to integrate #41

josiahseaman commented Apr 20, 2020 •

edited

Loading

hhaider15 commented Apr 21, 2020

hhaider15 commented Apr 21, 2020

hhaider15 commented Apr 21, 2020

hhaider15 commented Apr 21, 2020

hhaider15 commented Apr 21, 2020

subwaystation commented Apr 21, 2020

subwaystation commented Apr 21, 2020

innamoratika commented Apr 22, 2020

hhaider15 commented May 1, 2020

hhaider15 commented May 1, 2020

Find Annotation and Knowledge Graphs to integrate #41

Find Annotation and Knowledge Graphs to integrate #41

Comments

josiahseaman commented Apr 20, 2020 • edited Loading

hhaider15 commented Apr 21, 2020

hhaider15 commented Apr 21, 2020

hhaider15 commented Apr 21, 2020

hhaider15 commented Apr 21, 2020

hhaider15 commented Apr 21, 2020

subwaystation commented Apr 21, 2020

subwaystation commented Apr 21, 2020

innamoratika commented Apr 22, 2020

hhaider15 commented May 1, 2020

hhaider15 commented May 1, 2020

josiahseaman commented Apr 20, 2020 •

edited

Loading