You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I simulated a shotgun metagenomic Nanopore dataset using NanoSim, containing:
65% bacterial sequences from 32 bacterial species
35% human sequences
For profiling, I created a Sylph database with ~5K microbial species from RefSeq (excluding human sequences) and ran Sylph with the -u option. However, the bacterial sequence abundance in the results is reported as 100%, with no unclassified reads.
In contrast, profiling the same dataset with Kraken2 resulted in 85% unclassified reads due to the human sequences.
What configurations or steps should I consider to ensure that Sylph correctly identifies unclassified reads, particularly for the human sequences?
clade_name relative_abundance sequence_abundance ANI (if strain-level) Coverage (if strain-level)
d__Bacteria 99.99989999999998 100.00000000000001 NA NA
d__Bacteria|p__Bacteroidota 59.7845 60.69369999999999 NA NA
d__Bacteria|p__Campylobacterota 11.6763 8.6889 NA NA
d__Bacteria|p__Pseudomonadota 11.299099999999997 14.9092 NA NA
d__Bacteria|p__Bacillota 5.7598 4.2177 NA NA
d__Bacteria|p__Bacillota_A 5.152 5.1825 NA NA
d__Bacteria|p__Desulfobacterota 2.6447 3.3607 NA NA
d__Bacteria|p__Actinomycetota 2.6022 1.9832999999999998 NA NA
d__Bacteria|p__Bacillota_C 1.0813 0.964 NA NA
The text was updated successfully, but these errors were encountered:
Hello Team,
I simulated a shotgun metagenomic Nanopore dataset using NanoSim, containing:
65% bacterial sequences from 32 bacterial species
35% human sequences
For profiling, I created a Sylph database with ~5K microbial species from RefSeq (excluding human sequences) and ran Sylph with the -u option. However, the bacterial sequence abundance in the results is reported as 100%, with no unclassified reads.
In contrast, profiling the same dataset with Kraken2 resulted in 85% unclassified reads due to the human sequences.
What configurations or steps should I consider to ensure that Sylph correctly identifies unclassified reads, particularly for the human sequences?
sylph command line used
sylph sketch -r {input.fastq}/{wildcards.sample}.clean.fastq.gz ; sylph query ./sylph-5k-PBFAV-v2.syldb {output.sketch} -t 30 > {wildcards.sample}.ani_queries.tsv ; sylph profile -u ./sylph-5k-PBFAV-v2.syldb {output.sketch} -t 30 -o {wildcards.sample}.profile.tsv;
head of sylph taxprofile output
The text was updated successfully, but these errors were encountered: