-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
658733c
commit 629fa6e
Showing
34 changed files
with
4,190 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# TRAINING A MODEL FROM SCRATCH | ||
|
||
If you want to train a model with your own data by following these steps: | ||
|
||
## 1. Prepare the data | ||
As indicated by its name, DeepSNP was developed for breakpoint detection in SNPa data. | ||
Use the *data_from_raw MODE* ([main.py](main.py)) to convert your raw SNPa data. | ||
|
||
The code expects your raw data stored as .dat (.csv like) files (e.g. exported from Rawcopy) containing the following information: | ||
``` | ||
#Chr Pos LRR BAF tCN cCN GC MAP | ||
1 15253 0.2133 -1.0 2 2 0.58 0.02025 | ||
1 48168 -0.0587 -1.0 2 2 0.44 0.01094 | ||
1 60826 0.387 -1.0 2 2 0.38 0.02865 | ||
1 61722 -0.0517 -1.0 2 2 0.34 0.0164 | ||
``` | ||
* Chr: Chromosom number | ||
* Pos: Position in the SNPa | ||
* LRR: normalized Log Ratio | ||
* BAF: B-allele frequency values | ||
* tCN: Groundtruth labels | ||
* cCN: Rawcopy predictions | ||
* GC: | ||
* MAP: | ||
|
||
LRR, BAF, tCN an cCN columns are mandatory! The others are not used right now and can be omitted. | ||
|
||
The algorithm parses through the array and selects positive (with BPs) and negative (without BPs) windows with certain sizes (defined by the *config.hop_modifiers* parameter) and saves them as h5 files to *config.data_dir/features*. | ||
|
||
## 2. Select a model | ||
The following models are available: | ||
* **Baseline** | ||
* **BLVGG** - VGG like feed forward deep neural network | ||
* **BLDenseNet** - DenseNet (Densely Connected Convolutional Network) | ||
* **BLDilDenseNet** - Adapted DenseNet with dilated convolution | ||
* **BLLSTMDenseNet** - Adapted DenseNet with LSTM (Long short-term memory) | ||
* **DeepSNP** | ||
* **V1**: with dilated convolution layers | ||
* **DeepSNP_V1_noAtt** - no attention unit | ||
* **DeepSNP_V1_finalAtt** - with attention unit | ||
* **V2**: without dilation, but with conventional convolution layers | ||
* **DeepSNP_V2_noAtt** - no attention unit | ||
* **DeepSNP_V2_finalAtt** - with attention unit | ||
|
||
## 3. Train a model | ||
Each model has its own configuration file located in [/configs](configs), which inherits the ConfigFlags class ([config.py](configs/config.py) where you can set a multitude of parameters. (see [config.py](configs/config.py) for details) |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/usr/bin/env python | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (C) Software Competence Center Hagenberg GmbH (SCCH) | ||
# All rights reserved. | ||
# ----------------------------------------------------------------------------- | ||
# This document contains proprietary information belonging to SCCH. | ||
# Passing on and copying of this document, use and communication of its | ||
# contents is not permitted without prior written authorization. | ||
# ----------------------------------------------------------------------------- | ||
# Created on : 17.09.2018 $ | ||
# by : fischer $ | ||
|
||
# --- imports ----------------------------------------------------------------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
#!/usr/bin/env python | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (C) | ||
# Software Competence Center Hagenberg GmbH (SCCH) | ||
# Institute of Computational Perception (CP) @ JKU Linz | ||
# All rights reserved. | ||
# ----------------------------------------------------------------------------- | ||
# This document contains proprietary information belonging to SCCH. | ||
# Passing on and copying of this document, use and communication of its | ||
# contents is not permitted without prior written authorization. | ||
# ----------------------------------------------------------------------------- | ||
# Created on : 17.09.2018 $ | ||
# by : fischer $ | ||
# Developers: Hamid Eghbalzadeh, Lukas Fischer | ||
|
||
# --- imports ----------------------------------------------------------------- | ||
import argparse | ||
|
||
|
||
class ConfigFlags: | ||
def __init__(self): | ||
""" | ||
DeepSNP parameter configuration | ||
""" | ||
parser = argparse.ArgumentParser(description='DeepSNP: An End-to-end Deep Neural Network with Attention-based ' | ||
'Localization for Breakpoint Detection in SNP Array Genomic Data') | ||
# Directories | ||
parser.add_argument("-d", "--data_dir", default=r'\data', type=str, help="Path and name of data directory") | ||
parser.add_argument("-s", "--save_dir", default=r'\results', type=str, help="Path and name of save directory") | ||
parser.add_argument("-m", "--model_dir", default=r'\models', type=str, help="Path and name of model directory") | ||
parser.add_argument("-l", "--log_dir", default=r'\logs', type=str, help="Path and name of logging directory") | ||
|
||
# Model to train | ||
parser.add_argument("-mn", "--model_name", type=str, help="Name of the model to train. DeepSNP or Baselines") | ||
|
||
# Data generation parameters | ||
parser.add_argument("-wm", "--margin", default=1, type=float, help="Window margin") | ||
parser.add_argument("-j", "--use_jitter", default=True, type=bool, help="Use jittered data") | ||
|
||
# Training parameters | ||
parser.add_argument("-e", "--epochs", default=200, type=int, help="Number of epochs") | ||
parser.add_argument("-lr", default=0.001, type=float, help="Learning rate") | ||
parser.add_argument("-b", "--batch_size", default=25, type=int, help="Batch size") | ||
parser.add_argument("-ns", "--n_splits", default=6, type=int, help="Number of cross validation splits") | ||
|
||
# todo implement continue train | ||
# parser.add_argument("-f", "--fold", default=-1, type=int, help="Cross validation fold. For re-runs.") | ||
# parser.add_argument("--continue_train", action='store_true', help="Continue training") | ||
|
||
parser.add_argument("--early_stopping_min_delta", default=0.0001, type=float, | ||
help="Early stopping min loss delta") | ||
parser.add_argument("--early_stopping_patience", type=float, help="Early stopping patience") | ||
parser.add_argument("--ref_factor", default=0.5, type=float, help="Refinement factor") | ||
parser.add_argument("--ref_min_lr", default=0.00001, type=float, help="Minimum refinement learning rate") | ||
parser.add_argument("--ref_patience", type=float, help="Refinement patience") | ||
parser.add_argument("--optimizer", default='adam', type=str, help="Optimizer used for training") | ||
parser.add_argument("--amsgrad", default=True, type=bool, help="Use AMSGrad for Adam optimizer") | ||
parser.add_argument("--loss", default='categorical_crossentropy', type=str, help="training loss") | ||
parser.add_argument("--dense_clf", default=True, type=bool, help="") # todo description | ||
parser.add_argument("--save_interval", default=10, type=int, | ||
help="Intervall to save models/checkpoints during training in epochs") | ||
parser.add_argument("--lr_reduce_rates", default=[1 / 2, 2 / 3, 3 / 4], type=list, | ||
help="Learning rate reduction rates (epochs/rate).") | ||
|
||
# Architecture specific parameters | ||
parser.add_argument("-fd", "--first_dilation", default=5, type=int, help="First dilation") | ||
parser.add_argument("-hd", "--hidden_dilation", default=2, type=int, help="Hidden dilation") | ||
parser.add_argument("-ff", "--first_filter", default=10, type=int, help="First filter") | ||
parser.add_argument("-hf", "--hidden_filter", default=5, type=int, help="Hidden filter") | ||
parser.add_argument("-dr", "--dropout_rate", default=0.2, type=float, help="Dropout rate") | ||
parser.add_argument("-cf", "--conv_architecture", default='DenseNet', type=str, | ||
help="Convolution architecture to use.") | ||
parser.add_argument("-ia", "--use_input_attention", action='store_true', help="Use input attention") | ||
parser.add_argument("-oa", "--use_output_attention", action='store_true', help="Use output attention") | ||
|
||
# Evaluation parameters | ||
parser.add_argument("--eval_num_classes", default=2, type=int, help="Number of classes, needed for evaluation") | ||
parser.add_argument("--pred_thresh", default=.5, type=float, help="Threshold for BP prediction") | ||
parser.add_argument("--gen_loc_output", action='store_true', help="Generate localization unit output") | ||
parser.add_argument("--plot_loc_results", action='store_true', help="Plot localization unit results") | ||
|
||
# Raw data processing parameters | ||
parser.add_argument("--hop_modifiers", default=[0.125, 0.125 / 2, 0.025, 0.0125, 0.006, 0.0025], type=list, | ||
help="Hop modifiers for margin/window size generation") | ||
parser.add_argument("--default_data_gen_margin", default=10000, type=float, | ||
help="Default window margin used for data generation") | ||
parser.add_argument("--bp_delta_thresh", default=0.001, type=float, help="Threshold for breakpoints") | ||
parser.add_argument("--plot_data_gen", action='store_true', help="Plot generated data windows") | ||
|
||
# GPU config | ||
parser.add_argument("--per_process_gpu_mem_frac", default=0.8, type=float, | ||
help="Per process GPU Memory fraction") | ||
parser.add_argument("--allow_gpu_growth", action='store_true', help="Allow GPU growth") | ||
|
||
# general config | ||
parser.add_argument("--random_seed", default=47, type=int, help="Random seed for cross validation, etc.") | ||
|
||
self.args = parser.parse_args() | ||
|
||
def return_flags(self): | ||
""" | ||
Return all parameters | ||
:return: | ||
""" | ||
return self.args |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#!/usr/bin/env python | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (C) | ||
# Software Competence Center Hagenberg GmbH (SCCH) | ||
# Institute of Computational Perception (CP) @ JKU Linz | ||
# All rights reserved. | ||
# ----------------------------------------------------------------------------- | ||
# This document contains proprietary information belonging to SCCH. | ||
# Passing on and copying of this document, use and communication of its | ||
# contents is not permitted without prior written authorization. | ||
# ----------------------------------------------------------------------------- | ||
# Created on : 17.09.2018 $ | ||
# by : fischer $ | ||
# Developers: Hamid Eghbalzadeh, Lukas Fischer | ||
|
||
# --- imports ----------------------------------------------------------------- | ||
from configs.config import ConfigFlags | ||
|
||
|
||
def load_config(): | ||
config = ConfigFlags().return_flags() | ||
|
||
# Directories | ||
config.data_dir = r'S:\Project_Stuff\VISIOMICS\Data\Genomicdata_26072018_samples50' | ||
config.model_dir = r'S:\Project_Stuff\VISIOMICS\DeepSNP\models' | ||
config.save_dir = r'S:\Project_Stuff\VISIOMICS\DeepSNP\results' | ||
config.log_dir = r'S:\Project_Stuff\VISIOMICS\DeepSNP\logs' | ||
|
||
# Model to train | ||
config.model_name = 'DeepSNP_V1' | ||
|
||
# Data generation parameters | ||
config.jitter = True | ||
|
||
# Training parameters | ||
# all default | ||
|
||
# Architecture specific parameters | ||
config.use_output_attention = True | ||
|
||
# Evaluation parameters | ||
# only used for evaluation | ||
config.gen_loc_output = True | ||
|
||
return config |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/env python | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (C) Software Competence Center Hagenberg GmbH (SCCH) | ||
# All rights reserved. | ||
# ----------------------------------------------------------------------------- | ||
# This document contains proprietary information belonging to SCCH. | ||
# Passing on and copying of this document, use and communication of its | ||
# contents is not permitted without prior written authorization. | ||
# ----------------------------------------------------------------------------- | ||
# Created on : 17.09.2018 $ | ||
# by : fischer $ | ||
|
||
# --- imports ----------------------------------------------------------------- | ||
from configs.config import ConfigFlags | ||
|
||
|
||
def load_config(): | ||
config = ConfigFlags().return_flags() | ||
|
||
# Directories | ||
config.data_dir = '' | ||
|
||
# Model to train | ||
config.model_name = 'DeepSNP_V1' | ||
|
||
# Data generation parameters | ||
config.jitter = True | ||
|
||
# Training parameters | ||
# all default | ||
|
||
# Architecture specific parameters | ||
config.use_input_attention = True | ||
config.use_output_attention = True | ||
|
||
# Evaluation parameters | ||
# only used for evaluation | ||
|
||
return config |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#!/usr/bin/env python | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (C) | ||
# Software Competence Center Hagenberg GmbH (SCCH) | ||
# Institute of Computational Perception (CP) @ JKU Linz | ||
# All rights reserved. | ||
# ----------------------------------------------------------------------------- | ||
# This document contains proprietary information belonging to SCCH. | ||
# Passing on and copying of this document, use and communication of its | ||
# contents is not permitted without prior written authorization. | ||
# ----------------------------------------------------------------------------- | ||
# Created on : 17.09.2018 $ | ||
# by : fischer $ | ||
# Developers: Hamid Eghbalzadeh, Lukas Fischer | ||
|
||
# --- imports ----------------------------------------------------------------- | ||
from configs.config import ConfigFlags | ||
|
||
|
||
def load_config(): | ||
config = ConfigFlags().return_flags() | ||
|
||
# Directories | ||
config.data_dir = r'S:\Project_Stuff\VISIOMICS\Data\Genomicdata_26072018_samples50' | ||
# config.save_dir = r'E:\Projects\VISIOMICS\trunk\BioInf\DeepSNP\results' | ||
# config.model_dir = r'E:\Projects\VISIOMICS\trunk\BioInf\DeepSNP\models' | ||
# config.model_dir = r'S:\Project_Stuff\VISIOMICS\DeepSNP\models' | ||
# config.log_dir = r'E:\Projects\VISIOMICS\trunk\BioInf\DeepSNP\logs' | ||
|
||
# Model to train | ||
config.model_name = 'DeepSNP_V1' | ||
|
||
# Data generation parameters | ||
config.jitter = True | ||
|
||
# Training parameters | ||
# all default | ||
|
||
# Architecture specific parameters | ||
# all default | ||
|
||
# Evaluation parameters | ||
# only used for evaluation | ||
|
||
return config |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
#!/usr/bin/env python | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (C) | ||
# Software Competence Center Hagenberg GmbH (SCCH) | ||
# Institute of Computational Perception (CP) @ JKU Linz | ||
# All rights reserved. | ||
# ----------------------------------------------------------------------------- | ||
# This document contains proprietary information belonging to SCCH. | ||
# Passing on and copying of this document, use and communication of its | ||
# contents is not permitted without prior written authorization. | ||
# ----------------------------------------------------------------------------- | ||
# Created on : 17.09.2018 $ | ||
# by : fischer $ | ||
# Developers: Hamid Eghbalzadeh, Lukas Fischer | ||
|
||
# --- imports ----------------------------------------------------------------- | ||
from configs.config import ConfigFlags | ||
|
||
|
||
def load_config(): | ||
config = ConfigFlags().return_flags() | ||
|
||
# Directories | ||
config.data_dir = '' | ||
|
||
# Model to train | ||
config.model_name = 'DeepSNP_V2' | ||
|
||
# Data generation parameters | ||
config.jitter = True | ||
|
||
# Training parameters | ||
# all default | ||
|
||
# Architecture specific parameters | ||
config.use_output_attention = True | ||
|
||
# Evaluation parameters | ||
# only used for evaluation | ||
|
||
return config |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#!/usr/bin/env python | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (C) | ||
# Software Competence Center Hagenberg GmbH (SCCH) | ||
# Institute of Computational Perception (CP) @ JKU Linz | ||
# All rights reserved. | ||
# ----------------------------------------------------------------------------- | ||
# This document contains proprietary information belonging to SCCH. | ||
# Passing on and copying of this document, use and communication of its | ||
# contents is not permitted without prior written authorization. | ||
# ----------------------------------------------------------------------------- | ||
# Created on : 17.09.2018 $ | ||
# by : fischer $ | ||
# Developers: Hamid Eghbalzadeh, Lukas Fischer | ||
|
||
# --- imports ----------------------------------------------------------------- | ||
from configs.config import ConfigFlags | ||
|
||
|
||
def load_config(): | ||
config = ConfigFlags().return_flags() | ||
|
||
# Directories | ||
config.data_dir = r'S:\Project_Stuff\VISIOMICS\Data\Genomicdata_26072018_samples50' | ||
# config.save_dir = r'E:\Projects\VISIOMICS\trunk\BioInf\DeepSNP\results' | ||
# config.model_dir = r'E:\Projects\VISIOMICS\trunk\BioInf\DeepSNP\models' | ||
# config.model_dir = r'S:\Project_Stuff\VISIOMICS\DeepSNP\models' | ||
# config.log_dir = r'E:\Projects\VISIOMICS\trunk\BioInf\DeepSNP\logs' | ||
|
||
# Model to train | ||
config.model_name = 'DeepSNP_V2' | ||
|
||
# Data generation parameters | ||
config.jitter = True | ||
|
||
# Training parameters | ||
# all default | ||
|
||
# Architecture specific parameters | ||
# all default | ||
|
||
# Evaluation parameters | ||
# only used for evaluation | ||
|
||
return config |
Oops, something went wrong.