Making changes to software is a difficult task. Up to 70% of the effort in the software process goes towards maintenance. This is mainly because programs have poor structure (due to poor initial design, or due to repeated ad hoc modifications) which makes them difficult to understand and modify. The focus of this thesis is duplication in source code, which is a major cause of poor structure in real programs. We make two contributions: (a) a novel program-slicing-based approach for detecting duplicated fragments in source code, and (b) a pair of algorithms, one that works on a single selected fragment of code, and the other that works on a group of matching fragments, for making the fragment(s) easily extractable into a separate procedure.
The key, novel aspect of our duplication-detection approach is its ability to detect “difficult” groups of matching fragments, i.e., groups in which matching statements are not in the same order in all fragments, and groups in which non-matching statements intervene between matching statements. Our procedure-extraction algorithms are an advance over previous work in this area in two ways: they employ a range of transformations, including code motion and duplication of predicates, to handle a wide variety of difficult clone groups that arise in practice; and they are the first, to our knowledge, to address the extraction of fragments that contain exiting jumps (jumps from within the region containing the fragment to locations outside that are not the “fall through” exit of the region). We present experimental results, using implementations of our duplication-detection algorithm and one of our extractability algorithms, that indicate that our approaches are effective and useful in practice.
对软件进行更改是一项艰巨的任务。软件过程中高达70%的工作用于维护。这主要是因为程序的结构不佳(由于初始设计不佳,或者由于重复的临时修改),这使得它们难以理解和修改。本文的研究重点是源代码的重复,这是导致实际程序结构不佳的主要原因。我们做出了两个贡献:(a)一种新颖的基于程序切片的方法,用于检测源代码中的重复片段,以及(b)一对算法,一个适用于单个选定的代码片段,另一个适用于一组匹配片段,用于使片段易于提取到单独的过程中。