Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve the hang problem #226 #254

Merged
merged 2 commits into from
Nov 1, 2023
Merged

Conversation

hunglin59638
Copy link
Contributor

Replacing multiprocessing with concurrent.futures to resolve the hang
Preloading output_tab_sequences into dictionary to improve program efficiency

Testing with SRA fastq: DRR387644
It's illummia 150bp paired-end reads from Klebsiella pneumoniae
File size is 541M.

fasterq-dump -O fq DRR387644
time rgi bwt -1 fq/DRR387644_1.fastq -2 fq/DRR387644_2.fastq -a kma -o bwt_out --include_wildcard --include_other_models
real    4m33.099s
user    6m31.818s
sys     0m50.640s

It took about 4.5 minutes to finish.
The current codes (778b83d) spent more than 1 hr, so I didn't wait it.

@raphenya
Copy link
Collaborator

@hunglin59638 can you account for the missing intermediate files i.e FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/rgi/rgi/tests/outputs/output_bwt_kma_interleaved.seqs.temp.txt'. I will review the code after all the tests pass. Cheers.

app/BWT.py Outdated
@@ -114,6 +114,19 @@ def __init__(self, aligner, include_wildcard, include_baits, read_one, read_two,

self.output_tab = os.path.join(self.working_directory, "{}.temp.txt".format(self.output_file))
self.output_tab_sequences = os.path.join(self.working_directory, "{}.seqs.temp.txt".format(self.output_file))
# Parse tab-delimited file into dictionary for mapped reads
self.alignments = {}
with open(self.output_tab_sequences, 'r') as csvfile:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the code for parsing the tab file to a dictionary to the BWT initialization.
However, at this time, the file ('{}.seqs.temp.txt'.format(self.output_file)) has not been generated yet.
This issue is fixed in the next commit (17dc039), which introduces a function called 'BWT.preload_alignments'.
Once the '.seqs.temp.txt' file is generated, it will be parsed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, @hunglin59638 got it. Thanks.

Copy link
Collaborator

@raphenya raphenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@raphenya raphenya merged commit e4e092d into arpcard:master Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants