-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax reference name validation with ValidationStringency #39
Comments
Thanks for your comment. |
$ damageprofiler -i metagenomebis.all_mapped.bam -r mpa_db_latest.fa -o damageprofiler
DamageProfiler v0.4.6
Invalid SAM/BAM file. Please check your file.
htsjdk.samtools.SAMException: Sequence name '157592__A0A150IGK6__fliD,flbC,flaV' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*' Version: 0.4.6, installed via Conda |
I did a small test file, and changing the ValidationStringency to 'SILENT' does not solve the problem, unfortunately. I will try to solve this until the next release. |
To be honest, thats also a quite invalid FastA header 🙄 |
Agree, Metaphlan uses funny reference names. Though, for example, this is valid: |
Unfortunately, I couldn't solve this problem. It's not influenced by the ValidationStringency parameter, and there doesn't seem to be an option to set a user-defined regex pattern. |
Right now, the (default ?) reference name validation stringency of htsjdk is pretty strict, leading to errors when reference names in alignment files are ill-formated (for example, the refererence names in the metaphlan database).
This should be relaxed with ValidationStringency to allow for non-perfectly formatted reference names.
CC @apeltzer @JudithNeukamm
The text was updated successfully, but these errors were encountered: