Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

SietsmaRJ/calculate_capice_precompute_scores

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PreComputeCapice

When you want the full capice experience, but don't want to use the model

This is a program that works on the wonderfull CAPICE predictive model.

Introduction

'PreComputeCapice' is a program developed by R.J. Sietsma and maintained by the Genomic Coordination Centre Groningen. The program calculates capice scores for all entries in a given CADD file.

Prerequisites

This program is developed in Pycharm 2020.1 (Professional Edition), performance on other systems is not guaranteed.

The program requires the following packages:

Warning: this program works for python version 3.6, it does not work for python 3.7 or higher because of numpy, scikit-learn and xgboost version dependency issues.

Installing

Step 1: acquire the source files Either clone or download the source files.

Step 2: activate the virtual environment

  • Open a terminal in the cloned or downloaded folder.
  • Make sure you have python3.6 installed by typing python3 --version.
  • Execute the following:
mkdir ./venv
cd ./venv
python3 -m venv ./
cd ..
  • Activate the virtual environment:
source ./venv/bin/activate
  • Install the required packages by executing:
pip install -r requirements.txt

Note: if any package fails to install, please try to install the package using:

pip install package==version

Make sure you have the virtual environment enabled before you install packages!

Usage

The program requires the following arguments:

  • -f / --file: the cadd file in gzip (.gz) format.
  • -m / --model: the pickled capice model in .dat format.
  • -o / --output: the location where the program should place it's files.

Optional argument:

  • -s / --batchsize: the amount of rows the program should read each iteration from the CADD file.

Example usage:

with a batch size of 1 million

python3 PreComputeCapice.py -f path/to/cadd/file.gz -m path/to/model.dat -o path/to/output/folder -s 1000000

Output

The program will output the following files:

  • For each chromosome in the CADD file, it makes a folder named chrx (where x = chromosome) and places a gzipped tsv of all CADD entries for that chromosome. Note: The program continually adds entries to this file, do NOT remove or replace this file till the program is done!
  • Log_output: a file with timed messages on updates within the program. (Does not contain error messages or warnings).
  • progression_json: a json file containing set parameters, like batch_size, to keep track of progress during the programs execution.

TODO:

  • Make input file (-f / --file) also specific for the progression.json.
  • Refactoring and optimization.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages