Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create tiny reference metagenome #27

Open
1 of 3 tasks
colindaven opened this issue Jul 12, 2022 · 3 comments
Open
1 of 3 tasks

create tiny reference metagenome #27

colindaven opened this issue Jul 12, 2022 · 3 comments
Assignees

Comments

@colindaven
Copy link
Contributor

colindaven commented Jul 12, 2022

CD TODO
@LisaHollstein FYI

for testing, plots + growth rates require data which isn't sufficient at present

  • streptococcus
  • rothia
  • tiny human chr part of chr22
@colindaven colindaven self-assigned this Jul 12, 2022
@colindaven
Copy link
Contributor Author

done here with mock ref genomes

https://github.com/colindaven/ref_testing

@colindaven
Copy link
Contributor Author

colindaven commented Jul 24, 2022

Let me know @LisaHollstein if you have any problems testing with this reference.

I use part of the public data file
SRR11207337_R1.fastq
for testing against this reference.

More exactly, make a fastq input file from the larger FASTQ using head


wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_mock_R1.fastq.gz
gunzip SRR11207337_mock_R1.fastq
head -n 2000000 SRR11207337_mock_R1.fastq > public_mock_sm_R1.fastq

#also a very small version for rapid testing
head -n 90000 SRR11207337_mock_R1.fastq > public_mock_vsm_R1.fastq


This very small vsm dataset gives this sort of number of aligned reads - 4300, 2660. This might be enough for testing. I'll gzip and add it to the repo as well under the test dir.


Pseudomonas_aeruginosa_complete_genome  6792330 4300    0
tig00000001     1480242 25      0
tig00000003     1076883 19      0

Salmonella_enterica_chromosome  4759746 2660    0
Staphylococcus_aureus_chromosome        2718780 2057    0


If you don't want to build/get a new test reference fasta :

You can use the full 2021_12 reference though if you like as well? Will be slower though.

@colindaven
Copy link
Contributor Author

I added these fastq.gz datasets with
19076bd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant