Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samtools sort error #1479

Closed
abretaud opened this issue Sep 19, 2017 · 3 comments
Closed

Samtools sort error #1479

abretaud opened this issue Sep 19, 2017 · 3 comments

Comments

@abretaud
Copy link
Contributor

A little bug one of our users encountered (on 17.05):
Using samtools sort with the -n option produces a bame file sorted by read names (hich is intended), but galaxy then fails to set metadata on the result with this error:

Traceback (most recent call last):
  File "/opt/galaxy-dist/lib/galaxy/jobs/runners/__init__.py", line 630, in finish_job
    job_state.job_wrapper.finish( stdout, stderr, exit_code )
  File "/opt/galaxy-dist/lib/galaxy/jobs/__init__.py", line 1266, in finish
    dataset.datatype.set_meta( dataset, overwrite=False )
  File "/opt/galaxy-dist/lib/galaxy/datatypes/binary.py", line 411, in set_meta
    raise Exception( "Error Setting BAM Metadata: %s" % stderr )
Exception: Error Setting BAM Metadata: [E::hts_idx_push] unsorted positions
samtools index: "/opt/galaxy-dist/database/files/002/058/dataset_2058596.dat" is corrupted or unsort

I don't really see how to solve this, the bam file is indeed not indexable but it was asked by the user to have it unsorted...
Any idea?

(ping @cmonjeau)

@mvdbeek
Copy link
Member

mvdbeek commented Sep 19, 2017

Yeah, that is similar to galaxyproject/tools-devteam#478.
I think we should probably create an unsorted BAM datatype that can be converted back to sorted on demand. Alternatively we could make the metadata setting at least not choke on setting the index. There are a few tools out there that wold perform better with name-sorted bam files.

@bernt-matthias
Copy link
Contributor

This should be fixed in #1964 using the qname sorted bam data type.

Still the bam/sam/cram data types in Galaxy bother me a bit. Wouldn't it be fine to have a single data type for each and the tools may access the sorting state via the metadata (which is implemented in at least one of the data types). Having separate data types for each sort order seem a bit crazy to me, given that you can sort (with samtools) by coordinate, name, and tag (+ you can have primary and secondary sort keys)...

@bernt-matthias
Copy link
Contributor

Just closing this for now hoping that #1964 fixed it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants