DNA Sequence Alignment

This repository contains implementations of two different algorithms for solving the DNA sequence alignment problem: a basic version using Dynamic Programming (DP) and an optimized version using DP combined with a Divide-and-Conquer strategy for improved efficiency.

Project Description

This project involves implementing and comparing two solutions to the DNA sequence alignment problem. The basic solution uses a classic DP approach, while the memory-efficient solution enhances DP with a Divide-and-Conquer method to handle larger sequences more effectively.

Problem Overview

Given two strings, (X) and (Y), where:

(X = x_1, x_2, \ldots, x_m)
(Y = y_1, y_2, \ldots, y_n)

We aim to find the optimal alignment between (X) and (Y) by minimizing the alignment cost, which includes gap penalties and mismatch costs. The alignment process involves matching symbols from the two strings, allowing for gaps to achieve the best possible similarity score.

Gap Penalty and Mismatch Costs

Gap Penalty (δ): 30
Mismatch Costs (α):

	A	C	G	T
A	0	110	48	94
C	110	0	118	48
G	48	118	0	110
T	94	48	110	0

Input String Generation

Input strings are generated using a base string and a series of insertion steps, which iteratively double the length of the string. The process is as follows:

Start with a base string (s_0).
For each step, insert the current string into itself at a specified index, producing a new string.
Repeat for the given number of steps to generate the final string.

Installation

Clone the repository and navigate to the project directory:

git clone https://github.com/Hit07/Sequence-Alignment-DP.git
cd Sequence-Alignment-DP

Ensure you have the required dependencies installed:

pip install -r requirements.txt

Execute the basic algorithm:

python basic.py input.txt output.txt

Execute the memory-efficient algorithm:

python efficient.py input.txt output.txt

Input and Output

Input: A text file containing the base strings and the steps for string generation. Output: A text file containing the alignment cost, aligned strings, execution time, and memory usage.

Results

The results include:

Alignment Cost: The minimum cost of aligning the two strings.
Aligned Strings: The two input strings with gaps inserted to show the optimal alignment.
Execution Time: Time taken to compute the alignment.
Memory Usage: Memory used during the computation.
Additionally, plots are provided to compare CPU time and memory usage versus problem size for both algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.idea		.idea
SampleTestCases		SampleTestCases
datapoints		datapoints
.DS_Store		.DS_Store
CSCI570_Spring24_Project.pdf		CSCI570_Spring24_Project.pdf
README.md		README.md
Summary.docx		Summary.docx
basic.sh		basic.sh
basic_3.py		basic_3.py
efficient.sh		efficient.sh
efficient_3.py		efficient_3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNA Sequence Alignment

Project Description

Problem Overview

Gap Penalty and Mismatch Costs

Input String Generation

Installation

Ensure you have the required dependencies installed:

Execute the basic algorithm:

Execute the memory-efficient algorithm:

Input and Output

Results

About

Releases

Packages

Contributors 2

Languages

Hit07/Sequeance-Alignment-DP

Folders and files

Latest commit

History

Repository files navigation

DNA Sequence Alignment

Project Description

Problem Overview

Gap Penalty and Mismatch Costs

Input String Generation

Installation

Ensure you have the required dependencies installed:

Execute the basic algorithm:

Execute the memory-efficient algorithm:

Input and Output

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages