Information Retrieval (IR) plays a pivotal role in diverse Software Engineering (SE) tasks, e.g., bug localization and triaging, code retrieval, requirements analysis, etc. The choice of similarity measure is the core component of an IR technique. The performance of any IR method critically depends on selecting an appropriate similarity measure for the given application domain. Since different SE tasks operate on different document types like bug reports, software descriptions, source code, etc. that often contain non-standard domain-specific vocabulary, it is essential to understand which similarity measures work best for different SE documents.
This paper presents two case studies on the effect of different similarity measure on various SE documents w.r.t. two tasks: (i) project recommendation: finding similar GitHub projects and (ii) bug localization: retrieving buggy source file(s) correspond to a bug report. These tasks contain a diverse combination of textual (i.e. description, readme) and code (i.e. source code, API, import package) artifacts. We observe that the performance of IR models varies when applied to different artifact types. We find that, in general, the context-aware models achieve better performance on textual artifacts. In contrast, simple keyword-based bag-of-words models perform better on code artifacts. On the other hand, the probabilistic ranking model BM25 performs better on a mixture of text and code artifacts.
We further investigate how such an informed choice of similarity measure impacts the performance of SE tools. In particular, we analyze two previously proposed tools for project recommendation and bug localization tasks, which leverage diverse software artifacts, and observe that an informed choice of similarity measure indeed leads to improved performance of the existing SE tools.
信息检索(IR)在各种软件工程(SE)任务中起着关键作用,例如,错误定位和分类、代码检索、需求分析等。相似性度量的选择是IR技术的核心组成部分。任何IR方法的性能关键取决于为给定的应用领域选择适当的相似性度量。由于不同的SE任务对不同的文档类型(例如错误报告、软件描述、源代码等)进行操作,这些文档类型通常包含非标准的特定于领域的词汇表,因此了解哪些相似性度量最适合不同的SE文档至关重要。
本文介绍了两个案例研究的效果不同的相似性度量的各种SE文件w.r.t.两个任务:㈠项目建议:查找类似的GitHub项目和(ii)bug本地化:检索与bug报告对应的bug源文件。这些任务包含文本(即描述、自述文件)和代码(即源代码、API、导入包)工件的各种组合。我们观察到,IR模型的性能变化时,适用于不同的工件类型。我们发现,在一般情况下,上下文感知模型实现更好的性能上的文本工件。相比之下,简单的基于关键字的词袋模型在代码工件上执行得更好。另一方面,概率排名模型BM 25在文本和代码工件的混合上表现得更好。
我们进一步研究如何这样一个明智的选择相似性度量的影响SE工具的性能。特别是,我们分析了两个以前提出的工具,项目推荐和错误本地化任务,利用不同的软件工件,并观察到一个明智的选择相似性度量确实导致现有的SE工具的性能提高。