-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with gene data #95
Comments
Seems like there are IDs for the BMP4 gene
If this is what we need to track down, I can pull the Gene ID out so that we can map from our existing terms. |
For that third group, it's possible that setting dataset_gene based on the species in dataset_organism will work; I think I just assumed there would be multiple organisms for each dataset because the dataset_organism table is many-to-many, but in practice that doesn't seem to be the case for most of them. |
This is done on fb-dev, except for the "has data" column (I have some questions about that). There's an ER diagram of the gene-related tables here. Gene values in the data were translated like this: All but two of the alternate_id values in the gene table were empty. I verified that the mappings came up with the same values (one gene had a uniprot id as its value, but that id mapped to the right NCBI gene). There were 16 genes (affecting 14 datasets) that didn't map this way - two HGNC genes that appeared in mouse gene_summary records, one MGI gene that appeared in a human dataset (the only association of a species to that dataset appears to be a dataset_organism row), 8 FACEBASE genes that didn't match any names or ids, and 5 FACEBASE genes that mapped to multiple NCBI genes.
|
I gather the issues with matching are mostly due to species mismatches. E.g., mirlet7a-N seem to be mostly issues of a zebrafish gene associated with a mouse dataset. 'panTro4' is just not a gene name. I can only guess the contributor mistook that field for gene assembly. I can clean that up in the production database. [DONE] How is 'has_data' populated: A trigger, or some out of band process? Could we just filter on the Dataset and Biosample associations/references on 'has value' in the facet picker? |
I found several genes with links to both human and mouse species. The tables below have the species associated with those genes; the last column is the list of tables that gene/species was found in ("dataset_organism" is the join path gene -> dataset_gene -> dataset -> dataset_organism).
One MGI gene with both human and mouse data; something's probably wrong with the "human" data entries:
Three HGNC genes with both human and mouse data; something's probably wrong with the "mouse" data entries:
A bunch of Facebase-defined genes with both human and mouse data; I have no suggestions for these:
The text was updated successfully, but these errors were encountered: