Skip to content
/ G2Vec Public

G2Vec: Distributed gene representations for identification of cancer prognostic genes (Scientific Reports)

Notifications You must be signed in to change notification settings

mathcom/G2Vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

G2Vec: Distributed gene representations for identification of cancer prognostic genes

G2Vec is a novel network-based deep learning method to identify prognostic gene signatures (biomarkers).

Please refer to included 'manual.pdf'.

For more detail, please refer to J.H. Choi, et al. "G2Vec: Distributed gene representations for identification of cancer prognostic genes" Scientific Reports 8.1 (2018): 1-10.

Latest update: 06 February 2020


USAGE:

python G2Vec.py EXPRESSION_FILE CLINICAL_FILE NETWORK_FILE RESULT_NAME [optional parameters]

example:

$ python G2Vec.py ex_EXPRESSION.txt ex_CLINICAL.txt ex_NETWORK.txt ex_RESULT
>>> 0. Arguments
Namespace(CLINICAL_FILE='ex_CLINICAL.txt', EXPRESSION_FILE='ex_EXPRESSION.txt', NETWORK_FILE='ex_NETWORK.txt', RESULT_NAME='ex_RESULT', epoch=500, learningRate=0.005, lenPath=80, numBiomarker=50, numRepetition=10, sizeHiddenlayer=128)
>>> 1. Load data
>>> 2. Preprocess data
	n_samples: 135
	n_genes  : 7523     (common genes in both EXPRESSION and NETWORK)
	n_edges  : 216540   (edges with the common genes)
>>> 3. Generate random paths from each group
	*** most time consuming step ***
	n_paths : 45402
	n_genes : 3773      (genes in good or poor random paths)
>>> 4. Compute distributed representations using modified CBOW
Start training the modified CBOW with early stopping
	- Epoch: 000        ACC[val]=0.6336 ACC[tr]=0.6310 (2.369 sec)
	- Epoch: 005        ACC[val]=0.8044 ACC[tr]=0.8232 (10.459 sec)
	- Epoch: 010        ACC[val]=0.8434 ACC[tr]=0.8633 (11.008 sec)
	- Epoch: 015        ACC[val]=0.8626 ACC[tr]=0.8860 (10.811 sec)
	- Epoch: 020        ACC[val]=0.8728 ACC[tr]=0.9006 (11.119 sec)
	- Epoch: 025        ACC[val]=0.8812 ACC[tr]=0.9106 (10.898 sec)
	- Epoch(stop): 027  ACC[val]=0.8837 ACC[tr]=0.9142 (6.811 sec)
	Optimization Finish
>>> 5. Find L-groups
>>> 6. Select biomarkers with gene scores
>>> 7. Save results
	ex_RESULT_biomarkers.txt
	ex_RESULT_lgroups.txt
	ex_RESULT_vectors.txt
$

Note:

The option parameter '-h' shows help message.

$ python G2Vec.py -h

About

G2Vec: Distributed gene representations for identification of cancer prognostic genes (Scientific Reports)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages