Use `genbank_mapping.tsv` to join GenBank and SRA/Andersen lab records #97

joverlee521 · 2024-10-16T20:31:34Z

We could update the pipeline to use the genbank_mapping.tsv to join SRA/Andersen lab records with GenBank records instead of our current method of joining by SRA accessions! I haven't fully explored the data, but looks like the genbank_mapping.tsv files maps the duplicate SRA records to the same GenBank records so it should resolve duplicate samples between the SRA and GenBank records.

I do wonder if we would still need to dedup within the GenBank data that are not present in the SRA/Andersen lab records.

SRA/Andersen lab records mapped to GenBank accession available at https://github.com/andersen-lab/avian-influenza/blob/master/metadata/genbank_mapping.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `genbank_mapping.tsv` to join GenBank and SRA/Andersen lab records #97

Use `genbank_mapping.tsv` to join GenBank and SRA/Andersen lab records #97

joverlee521 commented Oct 16, 2024

Use genbank_mapping.tsv to join GenBank and SRA/Andersen lab records #97

Use genbank_mapping.tsv to join GenBank and SRA/Andersen lab records #97

Comments

joverlee521 commented Oct 16, 2024

Use `genbank_mapping.tsv` to join GenBank and SRA/Andersen lab records #97

Use `genbank_mapping.tsv` to join GenBank and SRA/Andersen lab records #97