Bioinformatics Algorithms

Hamming Distnce

Back to Index

This example creates a list of similar sequences within
Hamming distance d of the original sequence. If d is a 
small number then the list is short and the sequences in 
the list are very similar, but if d is large, then the 
list can be lengthy.

The HammingDistance function is used to compare two sequences. Basically, the neighbors function generates all possible substitutions in a pattern and returns a neighborhood of sequences within Hamming distance d of the original sequence. Note that a value of d greater than the length of the original sequence is problematic. What happens if you use such a value in the sample program?
#!/usr/bin/python import sys seq="ACGAC" d=1 def HammingDistance(p,q): score=0 for n in range(0,len(p)): if p[n] != q[n]: score = score + 1 return score def neighbors(patt, d): if d == 0: return patt if len(patt) == 1: return ["A", "C", "G", "T"] neighborhood = [] suffixN = neighbors(patt[1:],d) for s in suffixN: if HammingDistance(patt[1:], s) < d: for x in ["A","C","G","T"]: neighborhood.append(x+s) else: neighborhood.append(patt[0]+s) return neighborhood output = neighbors(seq,d) print seq print output

SOURCE CODE:
NEIGHBORS.py

Main Index