-
Notifications
You must be signed in to change notification settings - Fork 2
/
readme_data.txt
35 lines (29 loc) · 2.43 KB
/
readme_data.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
data.mat
This example data is adapted from Atwell et. al. (2010).
Genotype and flowering time phenotypes obtained from Atwell et. al. (2010).
For more info: https://www.ncbi.nlm.nih.gov/pubmed/20336072
-Rare variants (MAF < 0.1) are filtered out.
-Number of SNPs are reduced to 173219 from 214051.
X - {0,1,2} genotype matrix 199 x 173219 (samples vs SNPs)
To make it compatible with the algorithm we assigned {2} to random locations.
Y - binary phenotype matrix 199 x 17 (samples vs phenotypes)
- NaN values are depricated and data is binarized based on the mean.
genotypes - genotype matrix 214051 x 2 - 1st column = "0" in X, 2nd column = "1" in X
MAF - minor allele frequency vector 173219 x 1 = (# of ones in X) / (# of zeros in X)
phenotypes - flowering time phenotypes - 1st column : phenotype_name, 2nd column : index in Atwell (2010) phenotypes
snp - SNP information matrix 173219 x 3 - 1st column : snp id, 2nd column: chromosome, 3rd column : position
samples - samples information matrix 199 x 2 - Contains 2 different unique index for each sample
R - regulatory/coding information binary vector 173219x1 indicates whether corresponding SNP (in the feature matrix) is in regulatory/coding region. (0: not in regulatory/coding region, 1: in regulatory/coding region)
networks.mat
Contains 4 networks - GS, GM, GI, GS_HICN
- GS : Genomic Sequence - Nearest SNPs are connected
- GM : Genomic Membership - In addition to GS,
SNPs within the same gene are connected as a clique.
- GI : Genomic Interaction - In addition to GM,
SNPs within interacting genes are connected as a clique.
Interacting genes are determined from a Protein-Protein Interaction network.
- GS-HICN : In addition to GS, SNPs that are close in 3D are connected.
networks - 1 x 4 cell array - each cell contains a SNP vs SNP network
i.e. a 173219 x 173219 binary sparse matrix.
networkNames - 1 x 4 cell array - contains the names of the networks
network_options - struct - contains various options about network generation step.