Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic behaviour of miRDeep2 #66

Closed
Pfaendner opened this issue Mar 30, 2020 · 3 comments
Closed

Non-deterministic behaviour of miRDeep2 #66

Pfaendner opened this issue Mar 30, 2020 · 3 comments

Comments

@Pfaendner
Copy link

Hi all,

lately I am working a lot with miRDeep2 and especially with the script miRDeep2_core_algorithm.pl.
While running the script with the same input files and options, the generated output differed with respect to the number of potential precursors and the ordering of the read signature.

I think it would be a great improvement for your software to be deterministic and replicable.
To make your work easier I have sent a pull request #65 , proposing some changes.

What are your thoughts on this issue? I highly appreciate your feedback.

Best regards,

Christian Pfaendner

@Drmirdeep
Copy link
Collaborator

Hi,
in general I am always in for reproducibility. However, the algorithm is of course not fully deterministic which starts already with the randfold steps involved in it. Furthermore, I would not want to decide what is the mature and star form in a miRNA in precursor when both have equal number of reads supporting it, especially when this number is very low (e.g. 2 or 3) . This is more a problem of small numbers when it comes to analysis and less a problem of non-deterministic behavior.
Since there is no bug in here there is no reason to change the code. However,
if you want fully deterministic behavior you could set the
PERL_HASH_SEED variable before each perl script call to a predefined value. That should give reproducible behavior of perl's hash sort function.

@Pfaendner
Copy link
Author

Hi,

thanks for your comprehensive and quick answer.
Actually, I am going to patch randfold as well, providing an option to set a seed and make it deterministic.

I understand that deciding upon the mature form in a potential precursor is difficult. For this very reason, I think it is even more important for us users to have a consistent and understandable way of choosing the mature form. The result would be a reliable and consistent output.
You're right, PERL_HASH_SEED is a possibility. However, its behaviour must not be preserved across different versions of perl and it is not very user friendly.

If you do not want to change the main pipeline of miRDeep2, what do you think about adding a command-line flag? I would provide all the implementation work, sending you pull-requests when the code is finished.

Best,

Christian Pfaendner

@mschilli87
Copy link
Member

@Drmirdeep: Do I read this correctly when I understand this can be closed as 'wontfix'? 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants