Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous bases being included in confidence score calculation #878

Open
mcolpus opened this issue Sep 11, 2024 · 1 comment · May be fixed by #879
Open

Ambiguous bases being included in confidence score calculation #878

mcolpus opened this issue Sep 11, 2024 · 1 comment · May be fixed by #879

Comments

@mcolpus
Copy link

mcolpus commented Sep 11, 2024

In the documentation it says that ambiguous bases are excluded for the purpose of calculating confidence score.
But I've been getting some slight errors when running kraken2, which led me to try doing some simple print statements.

Example:
I ran kraken2 with --confidence 0.1 and got a read classified as follows

U	600	0	151|151	0:41 1:1 0:75 |:| 0:44 A:34 570:3 0:30 9606:3 0:3

Which has counts:
0: 193
1: 1
570:3
9606: 3
A: 34
so total is 200, or 234 with Ambiguous bases included.

I put print statement in classify.cc at line 439 in ResolveTree

fprintf(stderr, "Required score: %u\n", required_score);
fprintf(stderr, "Total minimizers: %lu\n", total_minimizers);
fprintf(stderr, "conf threshold: %f\n", opts.confidence_threshold);
for (auto &kv_pair : hit_counts) {
  fprintf(stderr, "Taxon %lu: %lu\n", kv_pair.first, kv_pair.second);
}

The output is:

Required score: 24
Total minimizers: 234
conf threshold: 0.100000
Taxon 45471: 3
Taxon 12739: 3
Taxon 1: 1

I'm not sure why the taxon are different (I assume a different number is used internally). But the required score is clearly using the total number of kmers which includes A's.

Is this intended or a bug?

@mcolpus
Copy link
Author

mcolpus commented Oct 15, 2024

On consideration. I think it I prefer the way it is currently done in the code (including ambiguous kmers in total count), so more sensible to change the docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant