-
Notifications
You must be signed in to change notification settings - Fork 480
Fix sanitizer from SemiBin #7366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
i had to change nameing of the softlink a bit to remove the file extension which was in the name. Now |
tools/semibin/macros.xml
Outdated
#for $e in $mode.multi_fasta.input_fasta | ||
#set $identifier = re.sub('[^\s\w\-\\.]', '_', str($e.element_identifier)) | ||
#set $identifier = re.sub('[^\s\w\-]', '_', str($e.element_identifier)) | ||
#set $final_identifier = '_'.join($identifier.split('_')[:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good idea. If people have collection element identifiers containing any sanitized character or an underscore this will remove part of the identifier. Users should remove file extensions during collection creation.
I guess you had some problems in the test due to the file extensions. Then its better to adapt test assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem in thr test was a code error because of the new naming of the file. The code tried to run to find something with S1
but since the name of the file was S1_fasta
it could not find it. So therefor i dont know how to fix beside of deleteing the test which should not be the point.
Do you might have any idea for this because i didnt thought about your point. I can also add a check to see if there is any file extension left of so cut it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess S1 is somewhere hardcoded in the test data files. This then need to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some work it should be fine now. I had to add extra files just for the one test case where the naming of the file was a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to merge it now. The fix is also implemented in my other PR!
#if $e.ext.endswith(".gz") | ||
gunzip -c '$e' > '${final_identifier}.fasta' && | ||
gunzip -c '$e' > '${identifier}.fasta' && | ||
#else | ||
ln -s '$e' '${final_identifier}.fasta' && | ||
ln -s '$e' '${identifier}.fasta' && | ||
#end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if $e.ext.endswith(".gz") | |
gunzip -c '$e' > '${final_identifier}.fasta' && | |
gunzip -c '$e' > '${identifier}.fasta' && | |
#else | |
ln -s '$e' '${final_identifier}.fasta' && | |
ln -s '$e' '${identifier}.fasta' && | |
#end if | |
ln -s '$e' '${identifier}.${e.ext}' && |
semibin should be able to unzip itself: https://github.com/BigDataBiology/SemiBin/blob/886913b48814cc0cd426ee46ec20d22cee875bf5/SemiBin/fasta.py#L23
also we should allow fasta.bz2
in the formats of the input parameter (macros input-fasta-single and input-fasta-multiple). But this probably also requires changes in other tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change it later today :)
FOR CONTRIBUTOR: