Speeding up comparisons for *very* large jobs on HPC #297
widdowquinn
started this conversation in
Ideas
Replies: 2 comments 3 replies
-
I don't follow this.
Does it have to be a list? We only care about order in terms of dependencies, which are not built-in to the joblist itself (unless I'm misreading the code). Sets should be faster, and will implicitly prevent any duplicates. May not be the only optimisation worth making, but it might help. |
Beta Was this translation helpful? Give feedback.
3 replies
-
This has a proposed fix around garbage collection in #306 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Even the task of compiling command-lines to run can be slow with large enough inputs. Currently, the process of compiling command lines is serial. We might get some speed-up if we used a different approach. I note:
I think this gets us two speed-ups:
I'm currently hitting this issue with a 2.5k genome job on a SLURM cluster - just generating the job list currently takes hours.
Beta Was this translation helpful? Give feedback.
All reactions