MultiLyGAN

A new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites

Requirements

In "Data" folder, we show the detailed information of seven types of lysine modified sites.
In "Data preprocessing" folder, we display the window cutting code and homologous sequences discarding code.
There are nine different encoding schemes in the folder named "Feature construction" which are AAindex, CKSAAP (Composition of K-space amino acid pairs), PWM (Position weight matrix), Reduced Alphabet, FoldAmyloid, BE (Binary Encoding), PC-PseAAC, SC-PseAAC, and Structure features. These programs can encode protein fragments into feature vectors of different dimensions.
The folder named "Dimensionality reduction" is used to acquire effective features and remove redundant features.
There are two sub-folders in the "sample augmentation" folder. To solve the data unbalanced issue, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methodology, were leveraged to generate synthetic samples.
The folder named "Classification" is based on Random Forest (RF) to stratify seven classes.

The pipeline of identification of multiple protein modified sites is visualized.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Classification		Classification
Data preprocessing		Data preprocessing
Data		Data
Dimensionality		Dimensionality
Feature construction		Feature construction
Figures		Figures
sample augmentation		sample augmentation
.DS_Store		.DS_Store
README.md		README.md