Examples

CABBaGe

Classification Algorithm Based on a Bayesian method for Genomics

An application developed in Perl that allows the classification feature extraction and bootstrapping of genomic sequences, to improve data visualization and usefulness for genomic applications

The application is built from three standalone modules:

Bayesian Classifier, Feature Extraction and Bootstrapping

The Bayesian Classifier, this module uses a Naive Bayes Classifier technique which is based on the so-called Bayesian theorem and is particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can often outperform more sophisticated classification methods. The module classifies genomic sequences into predetermined classes using a training genome matrix of known parameters (e.g. disease, host age, host sex, geographic location, drug resistance etc.)

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

Three files are needed: Training.csv, MetaData.csv and Query.csv

The Training.csv file is a Boolean table that denotes the presence or absence of a certain "feature" which can either be a gene (Pan-genome*) or a genomic region denotated by a virtual probe (Virtual Hybridization*).

The MetaData.csv file is a table that relates each of the samples form the Training.csv to predefined classes.

The Query.csv file are the samples that must be classified, and they should be on the same format as in the Training.csv file.

The Feature Extraction, the classification has the problem of high dimensionality of feature space due to the extensive information from genomic data. This high dimensionality of feature space is solved by feature selection and feature extraction methods and improves the performance of categorization. The feature selection and feature extraction techniques remove the irrelevant features from the test and reduce the dimensionality of feature space. The module accomplishes this task using a statistics test (Chi squared) extracting the most informative genes or genomic regions that make a sample belong to a particular class, the cutoff value for this procedure can be set by the user being the default p-value of 0.90.

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

Two files are needed: Training.csv and MetaData.csv

The Training.csv file is a Boolean table that denotes the presence or absence of a certain "feature" which can either be a gene (Pan-genome*) or a genomic region denotated by a virtual probe (Virtual Hybridization*).

The MetaData.csv file is a table that relates each of the samples form the Training.csv to predefined classes.

The Bootstrapping, the bootstrap is a tool for making statistical inferences when standard parametric assumptions are questionable. For the case of genomics, sample size can be an issue, such problems can be biased be the use on this module which, generates random samples from a population with a certain distribution this way unevenness of classes can be overcome.

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

Three files are needed: Training.csv, MetaData.csv and Query.csv

The Training.csv file is a Boolean table that denotes the presence or absence of a certain "feature" which can either be a gene (Pan-genome*) or a genomic region denotated by a virtual probe (Virtual Hybridization*).

The MetaData.csv file is a table that relates each of the samples form the Training.csv to predefined classes.

Files examples.

Training.csv

Sample1 Sample2 Sample3 Sample4

Gene/Probe a 0 1 1 0

Gene/Probe b 1 1 1 1

Gene/Probe c 1 1 0 0

Gene/Probe d 0 0 0 0

Gene/Probe e 1 0 1 1

Gene/Probe f 1 1 1 1

The Samples and Gene/Probe names should be determined by the user, the file can contain as many rows and columns as needed.

MetaData.csv

Sample Class

Sample1 A

Sample2 B

Sample3 C

Sample4 D

Sample5 E

Sample6 F

The Samples and Gene/Probe names should be determined by the user, the file can contain as many rows as needed.

Query.csv

SampleX SampleY SampleZ

Gene/Probe a 1 1 1

Gene/Probe b 1 1 0

Gene/Probe c 1 0 0

Gene/Probe d 0 0 0

Gene/Probe e 1 0 0

Gene/Probe f 1 1 1

The Samples and Gene/Probe names should be determined by the user, nonetheless the Gene/Probe must be the same as the Training.csv file used the file can contain as many rows and columns as needed.

Examples

Actual image of output graph OddsRatio Heat map

Actual image of output graph OddsRatio Feature map

Actual image of output graph InformationGain Heat map

Actual image of output graph InformationGain Feature map

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
Bootstrapping		Bootstrapping
Icons		Icons
data		data
src		src
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CABBaGe

Classification Algorithm Based on a Bayesian method for Genomics

Bayesian Classifier, Feature Extraction and Bootstrapping

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

Files examples.

Training.csv

MetaData.csv

Query.csv

Examples

About

Releases

Packages

Contributors 2

Languages

	`Sample1`	`Sample2`	`Sample3`	`Sample4`
`Gene/Probe a`	0	1	1	0
`Gene/Probe b`	1	1	1	1
`Gene/Probe c`	1	1	0	0
`Gene/Probe d`	0	0	0	0
`Gene/Probe e`	1	0	1	1
`Gene/Probe f`	1	1	1	1

`Sample`	`Class`
`Sample1`	A
`Sample2`	B
`Sample3`	C
`Sample4`	D
`Sample5`	E
`Sample6`	F

TorresRC/CABBaGe

Folders and files

Latest commit

History

Repository files navigation

CABBaGe

Classification Algorithm Based on a Bayesian method for Genomics

Bayesian Classifier, Feature Extraction and Bootstrapping

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

How to

Note: In order for the CABBAGE to resume operation the input format must be comma-separated values (.csv) files

Files examples.

Training.csv

MetaData.csv

Query.csv

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages