Skip to content

Conversation

@yangyxt
Copy link
Contributor

@yangyxt yangyxt commented Dec 16, 2024

This can at most lead to 24 parallel subprocesses to extract the prescored variants.

I did this because I run into a situation where this step has been running for over 41 hours to extract prescored records for a VCF file with near 300k variant records.

@yangyxt
Copy link
Contributor Author

yangyxt commented Dec 19, 2024

I also refractor the esmSCore_inFrame and esmScore_frameshift script because I found them running over 30 hours to process a VCF file with 500k variants.

The most time consuming part is using list appending (append a single item on a big list is incredibly slow in python), so I switch them to use numpy array instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant