-
Notifications
You must be signed in to change notification settings - Fork 2
/
BA2_H - Dist(pattern, Seqs).py
72 lines (54 loc) · 2 KB
/
BA2_H - Dist(pattern, Seqs).py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Dec 24 00:03:57 2019
@author: jasonmoggridge
BA2H - Implement Distance(Between Pattern And Strings)
The first potential issue with implementing MedianString
from “Find a Median String” is writing a function to
compute d(Pattern, Dna) = ∑ti=1 d(Pattern, Dnai),
the sum of distances between Pattern and each string
in Dna = {Dna1, ..., Dnat}.
This task is achieved by the following pseudocode.
DistanceBetweenPatternAndStrings(Pattern, Dna)
k ← length(Pattern)
distance ← 0
for each string Text in Dna:
HammingDistance ← ∞
for each k-mer Pattern’ in Text:
if HammingDistance > HammingDistance(Pattern, Pattern’)
HammingDistance ← HammingDistance(Pattern, Pattern’)
distance ← distance + HammingDistance
return distance
Compute DistanceBetweenPatternAndStrings
Find the distance between a pattern and a set of strings.
Given: A DNA string Pattern and a collection of DNA strings Dna.
Return: DistanceBetweenPatternAndStrings(Pattern, Dna).
"""
######
def Distance(Pattern, Seqs):
##
def Hamming(seq1, seq2): #distance for pattern -> kmer of equal length from seq
distance = 0
for i in range(len(seq1)):
if seq1[i] != seq2[i]:
distance += 1
return distance
##
k = len(Pattern)
distance_pattern = 0
for dna in Seqs:
hamming_d = float('inf')
kmers = [dna[i:i+k] for i in range(len(dna)-k+1)]
for kmer in kmers:
score = Hamming(kmer, Pattern)
if score < hamming_d:
hamming_d = score
distance_pattern += hamming_d
return distance_pattern
###
f = open('//Users/jasonmoggridge/Desktop/rosalind_ba2h.txt', 'r')
Pattern = (str(f.readline().strip()))
DNAs = list(str(i) for i in f.readline().strip().split(' '))
###
D = Distance(Pattern, DNAs)