Making changes to software is a difficult task. Up to 70% of the effort in the software process goes towards maintenance. This is mainly because programs have poor structure (due to poor initial design, or due to repeated ad hoc modifications) which makes them difficult to understand and modify. The focus of this thesis is duplication in source code, which is a major cause of poor structure in real programs. We make two contributions: (a) a novel program-slicing-based approach for detecting duplicated fragments in source code, and (b) a pair of algorithms, one that works on a single selected fragment of code, and the other that works on a group of matching fragments, for making the fragment(s) easily extractable into a separate procedure.
The key, novel aspect of our duplication-detection approach is its ability to detect “difficult” groups of matching fragments, i.e., groups in which matching statements are not in the same order in all fragments, and groups in which non-matching statements intervene between matching statements. Our procedure-extraction algorithms are an advance over previous work in this area in two ways: they employ a range of transformations, including code motion and duplication of predicates, to handle a wide variety of difficult clone groups that arise in practice; and they are the first, to our knowledge, to address the extraction of fragments that contain exiting jumps (jumps from within the region containing the fragment to locations outside that are not the “fall through” exit of the region). We present experimental results, using implementations of our duplication-detection algorithm and one of our extractability algorithms, that indicate that our approaches are effective and useful in practice.
对软件进行修改是一项艰巨的任务。在软件过程中,多达70%的工作都用于维护。这主要是因为程序结构不佳(由于初始设计不良,或者由于反复的临时修改),这使得它们难以理解和修改。本论文的重点是源代码中的重复,这是实际程序中结构不佳的一个主要原因。我们做出了两点贡献:(a)一种基于程序切片的新颖方法,用于检测源代码中的重复片段;(b)一对算法,一个作用于单个选定的代码片段,另一个作用于一组匹配的片段,用于使片段易于提取到一个单独的过程中。
我们的重复检测方法的关键新颖之处在于它能够检测“困难的”匹配片段组,即匹配语句在所有片段中顺序不一致的组,以及在匹配语句之间夹杂着不匹配语句的组。我们的过程提取算法在两个方面比该领域以前的工作有所进步:它们采用了一系列转换,包括代码移动和谓词复制,以处理实践中出现的各种困难的克隆组;并且据我们所知,它们是第一个解决包含跳出语句(从包含片段的区域内跳转到该区域外非“直接通过”出口的位置)的片段提取问题的算法。我们使用我们的重复检测算法和我们的一个可提取性算法的实现展示了实验结果,这些结果表明我们的方法在实践中是有效和有用的。