Skip to content
This repository has been archived by the owner on Oct 30, 2021. It is now read-only.

contig in variants1kG versus chrom in knownGene #7

Open
maxbox51 opened this issue May 18, 2014 · 0 comments
Open

contig in variants1kG versus chrom in knownGene #7

maxbox51 opened this issue May 18, 2014 · 0 comments

Comments

@maxbox51
Copy link

There are two forms of sequence identifier being used in different tables for the same sequence. The variant1kG table refers to chromosome 1 in field "contig" as "1", while the knownGene table refers to chromosome 1 in field "chrom" as "chr1". We have this issue with the data as it comes from the UCSC site, too, but it's a bigger deal if you're trying to join across tables. It is possible, of course, to shorten "chr1" to "1" in a subquery, but I suspect it is more efficient to avoid the subquery when possible.

Also, this is a much more minor issue, but it would be nice if the field names containing the same values were the same across tables, to indicate that they are an appropriate field to join on. Both chromosomes and contigs (which is a more general term, but suggests less than a chromosome) are sequences, and I personally would prefer both of them to be replaced by "sequence" or "seq".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant