ConSeGA - Consensus Seqeunce-based Genetic Algorithm

A genetic algorithm used to identify regions of regularity in the genome of Drosophila melanogaster. This algorithm has been tested on previously published ChIP data for the transcription factor, Myocyte-enhancer factor 2 (MEF2).

Introduction and Theory
How does it work?
Software Dependencies

Introduction

Genetic algorithms (GA) are algorithms in which the basic principles behind Darwinian evolution are applied algorithmically to a computational probelm of interest. This approach can generate results by simply applying the idea of "survival of the fittest".

Chromosomes store genetic information
Individuals in a population that are deemed “most genetically fit” prevail
Those that do not, either die off or undergo mutation/crossover
Those that are the most elite continue their lineage in the next generation
This process is repeated for several generations, until the population evolves and equilibrium is reached

These five concepts can be applied programmatically to many scenarios in order to solve complex problems. Some examples of where genetic algorithms have been very successful are (but not limited to):

Software bug detection and repair (Forrest et al 2009, Le Goues et al 2013)
Medicine and Biology (Ghaheri et al 2015, Hackenberger, 2019, )
Bioinformatics & data science (Manning et al 2013, Yang and Honavar)
Film, movies, and gaming (Trescak et al 2012, Sanjuan et al 2007)

What is the theory behind ConSeGA and how does it work?

ConSeGA is a genetic algorithm approach for detecting consensus sequences in biological data sets.

The algorithm defines the set of set of "individuals" or "chromosomes" as: a set of strings of 15 characters in length comprised of only A, T, G, and C. In the case of the algorithm this was derived from MEF2-ChIP data ().

The method to determine how "fit" an individual is, or fitness function, is determed by the window of string with a sum of the 10 lowest consecutive entopy values. Here is the entropy sliding window calculation:

It is then followed by the overall sum of entropies sliding window calculation:

The most fit window of strings is automatically excluded from mutation.

Any string that is not in the most fit windows is subject to random mutation. A mutation in this case, is a random position shift in the alignment array of +/-3 base pairs.
Elitism is use to ensure the most fit window of string is automatically incorporated into the next generation of string alignments.
This process is repeated or "evolved" for 3000 generations.

Software Dependencies

ConSeGA was written in Python version 2.7. The following Python dependecies are required:

NumPy
Matplotlib

Running ConSeGA

The main python function to run ConSeGA is tf_genetic_algorithm_multFunc_version2.py. It requires an input text file of "chromsomes" which are kmer sequences. There will be a command line option of arguments, in order to update generations, array size, kmer length, and input file location, however, for the moment this file is hardcoded to mimic what we have validated. These parameters can be updated within the code at line 23 for your input file and the size of kmers, arrays, and generations can also be updated in the code and then run.

python2.7 tf_genetic_algorithm_multFunc_version2.py

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
pictures		pictures
LICENSE		LICENSE
README.md		README.md
detect_novel_tfs.py		detect_novel_tfs.py
findMef2.py		findMef2.py
mostFreqKmer.py		mostFreqKmer.py
possibleCombos.py		possibleCombos.py
sixMersFoldChange.py		sixMersFoldChange.py
subgroup_kmers.py		subgroup_kmers.py
tf_genetic_algorithm_multFunc_maskAT.py		tf_genetic_algorithm_multFunc_maskAT.py
tf_genetic_algorithm_multFunc_version2.py		tf_genetic_algorithm_multFunc_version2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConSeGA - Consensus Seqeunce-based Genetic Algorithm

Introduction

What is the theory behind ConSeGA and how does it work?

Software Dependencies

Running ConSeGA

About

Releases

Packages

Languages

License

tbrunetti/ConSeGA

Folders and files

Latest commit

History

Repository files navigation

ConSeGA - Consensus Seqeunce-based Genetic Algorithm

Introduction

What is the theory behind ConSeGA and how does it work?

Software Dependencies

Running ConSeGA

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages