resolve the hang problem #226 #254

hunglin59638 · 2023-10-27T11:28:58Z

Replacing multiprocessing with concurrent.futures to resolve the hang
Preloading output_tab_sequences into dictionary to improve program efficiency

Testing with SRA fastq: DRR387644
It's illummia 150bp paired-end reads from Klebsiella pneumoniae
File size is 541M.

fasterq-dump -O fq DRR387644
time rgi bwt -1 fq/DRR387644_1.fastq -2 fq/DRR387644_2.fastq -a kma -o bwt_out --include_wildcard --include_other_models
real    4m33.099s
user    6m31.818s
sys     0m50.640s

It took about 4.5 minutes to finish.
The current codes (778b83d) spent more than 1 hr, so I didn't wait it.

raphenya · 2023-10-31T15:18:56Z

@hunglin59638 can you account for the missing intermediate files i.e FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/rgi/rgi/tests/outputs/output_bwt_kma_interleaved.seqs.temp.txt'. I will review the code after all the tests pass. Cheers.

hunglin59638 · 2023-11-01T12:53:59Z

app/BWT.py

@@ -114,6 +114,19 @@ def __init__(self, aligner, include_wildcard, include_baits, read_one, read_two,

 		self.output_tab = os.path.join(self.working_directory, "{}.temp.txt".format(self.output_file))
 		self.output_tab_sequences = os.path.join(self.working_directory, "{}.seqs.temp.txt".format(self.output_file))
+		# Parse tab-delimited file into dictionary for mapped reads
+		self.alignments = {}
+		with open(self.output_tab_sequences, 'r') as csvfile:


I moved the code for parsing the tab file to a dictionary to the BWT initialization.
However, at this time, the file ('{}.seqs.temp.txt'.format(self.output_file)) has not been generated yet.
This issue is fixed in the next commit (17dc039), which introduces a function called 'BWT.preload_alignments'.
Once the '.seqs.temp.txt' file is generated, it will be parsed.

Ok, @hunglin59638 got it. Thanks.

raphenya

Thanks

hunglin59638 added 2 commits October 27, 2023 01:47

resolve the hang problem arpcard#226

e24b6f4

preload output_tab_sequences to dictionary

17dc039

raphenya assigned hunglin59638 Oct 31, 2023

hunglin59638 commented Nov 1, 2023

View reviewed changes

raphenya reviewed Nov 1, 2023

View reviewed changes

raphenya merged commit e4e092d into arpcard:master Nov 1, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolve the hang problem #226 #254

resolve the hang problem #226 #254

hunglin59638 commented Oct 27, 2023

raphenya commented Oct 31, 2023

hunglin59638 Nov 1, 2023

raphenya Nov 1, 2023

raphenya left a comment

resolve the hang problem #226 #254

resolve the hang problem #226 #254

Conversation

hunglin59638 commented Oct 27, 2023

raphenya commented Oct 31, 2023

hunglin59638 Nov 1, 2023

Choose a reason for hiding this comment

raphenya Nov 1, 2023

Choose a reason for hiding this comment

raphenya left a comment

Choose a reason for hiding this comment