阅读背景:

在Python中聚类相似字符串的算法?

来源:互联网 

I'm working on a script that currently contains multiple lists of DNA sequences (each list has a varying number of DNA sequences) and I need to cluster the sequences in each list based on Hamming Distance similarity. My current implementation of this (very crude at the moment) extracts the first sequence in the list and calculates the Hamming Distance of each subsequent sequence. If it's within a certain Hamming Distance, it appends it to a new list which later is used to remove sequences from the original list as well as store the similar sequences in a defaultdict. See my current implementation of my code below:I'm working on a script that currently contains




你的当前访问异常,请进行认证后继续阅读剩余内容。

分享到: