DNA matching is a crucial step in sequence alignment. Since sequence alignment is an approximate matching process there is a need for good approximate algorithms. The process of matching in sequence alignment is generally finding longest common subsequences. However, finding a longest common subsequence may not be the best solution for either a database match or an assembly. An optimal alignment of subsequences is based on several factors, such as quality of bases, length of overlap, etc. Factors such as quality indicate if the data is an actual read or an experimental error. Fuzzy logic allows tolerance of inexactness or errors in sub sequence matching. We propose fuzzy logic for approximate matching of subsequences. Fuzzy characteristic functions are derived for parameters that influence a match. We develop a prototype for a fuzzy assembler. The assembler is designed to work with low quality data which is generally rejected by most of the existing techniques. We test the assembler on sequences from two genome projects namely, Drosophila melanogaster and Arabidopsis thaliana. The results are compared with other assemblers. The fuzzy assembler successfully assembled sequences and performed similar and in some cases better than existing techniques
DNA匹配是序列比对中的关键步骤。由于序列比对是一个近似匹配过程,所以需要良好的近似算法。序列比对中的匹配过程通常是寻找最长公共子序列。然而,对于数据库匹配或组装而言,寻找最长公共子序列可能不是最佳解决方案。子序列的最优比对基于几个因素,例如碱基质量、重叠长度等。诸如质量之类的因素表明数据是实际读取的还是实验误差。模糊逻辑允许在子序列匹配中容忍不精确性或误差。我们提出将模糊逻辑用于子序列的近似匹配。针对影响匹配的参数推导出模糊特征函数。我们开发了一个模糊组装器的原型。该组装器旨在处理通常被大多数现有技术拒绝的低质量数据。我们在来自两个基因组项目(黑腹果蝇和拟南芥)的序列上测试该组装器。将结果与其他组装器进行比较。模糊组装器成功地组装了序列,其性能与现有技术相似,在某些情况下甚至更好。