喵ID:Xnh2J9免责声明

An Empirical Study on the Usage of Transformer Models for Code Completion

基本信息

DOI:
10.1109/tse.2021.3128234
发表时间:
2021-08
影响因子:
7.4
通讯作者:
Matteo Ciniselli;Nathan Cooper;L. Pascarella;A. Mastropaolo;Emad Aghajani;D. Poshyvanyk;Massimiliano Di Penta;G. Bavota
中科院分区:
计算机科学1区
文献类型:
--
作者: Matteo Ciniselli;Nathan Cooper;L. Pascarella;A. Mastropaolo;Emad Aghajani;D. Poshyvanyk;Massimiliano Di Penta;G. Bavota研究方向: -- MeSH主题词: --
关键词: --
来源链接:pubmed详情页地址

文献摘要

Code completion aims at speeding up code writing by predicting the next code token(s) the developer is likely to write. Works in this field focused on improving the accuracy of the generated predictions, with substantial leaps forward made possible by deep learning (DL) models. However, code completion techniques are mostly evaluated in the scenario of predicting the next token to type, with few exceptions pushing the boundaries to the prediction of an entire code statement. Thus, little is known about the performance of state-of-the-art code completion approaches in more challenging scenarios in which, for example, an entire code block must be generated. We present a large-scale study exploring the capabilities of state-of-the-art Transformer-based models in supporting code completion at different granularity levels, including single tokens, one or multiple entire statements, up to entire code blocks (e.g., the iterated block of a for loop). We experimented with several variants of two recently proposed Transformer-based models, namely RoBERTa and the Text-To-Text Transfer Transformer (T5), for the task of code completion. The achieved results show that Transformer-based models, and in particular the T5, represent a viable solution for code completion, with perfect predictions ranging from $\sim$∼29%, obtained when asking the model to guess entire blocks, up to $\sim$∼69%, reached in the simpler scenario of few tokens masked from the same code statement.
代码补全旨在通过预测开发人员可能要编写的下一个(或多个)代码标记来加快代码编写速度。该领域的工作侧重于提高生成预测的准确性,深度学习(DL)模型使这方面取得了重大进展。然而,代码补全技术大多是在预测下一个要输入的标记的场景中进行评估的,只有少数例外将边界扩展到对整个代码语句的预测。因此,对于最先进的代码补全方法在更具挑战性的场景(例如必须生成整个代码块)中的性能知之甚少。我们进行了一项大规模研究,探索最先进的基于Transformer的模型在不同粒度级别支持代码补全的能力,包括单个标记、一个或多个完整语句,直至整个代码块(例如for循环的迭代块)。我们对两种最近提出的基于Transformer的模型(即RoBERTa和文本到文本迁移Transformer(T5))的几个变体进行了代码补全任务的实验。所取得的结果表明,基于Transformer的模型,特别是T5,是代码补全的一种可行解决方案,当要求模型猜测整个代码块时,完美预测率约为29%,在同一代码语句中少数标记被掩码的较简单场景下,完美预测率可达约69%。
参考文献(91)
被引文献(52)

数据更新时间:{{ references.updateTime }}

关联基金

Collaborative Research: CPS: Medium: Enabling Data-Driven Security and Safety Analyses for Cyber-Physical Systems
批准号:
2132285
批准年份:
2022
资助金额:
38.24
项目类别:
Standard Grant
Matteo Ciniselli;Nathan Cooper;L. Pascarella;A. Mastropaolo;Emad Aghajani;D. Poshyvanyk;Massimiliano Di Penta;G. Bavota
通讯地址:
--
所属机构:
--
电子邮件地址:
--
免责声明免责声明
1、猫眼课题宝专注于为科研工作者提供省时、高效的文献资源检索和预览服务;
2、网站中的文献信息均来自公开、合规、透明的互联网文献查询网站,可以通过页面中的“来源链接”跳转数据网站。
3、在猫眼课题宝点击“求助全文”按钮,发布文献应助需求时求助者需要支付50喵币作为应助成功后的答谢给应助者,发送到用助者账户中。若文献求助失败支付的50喵币将退还至求助者账户中。所支付的喵币仅作为答谢,而不是作为文献的“购买”费用,平台也不从中收取任何费用,
4、特别提醒用户通过求助获得的文献原文仅用户个人学习使用,不得用于商业用途,否则一切风险由用户本人承担;
5、本平台尊重知识产权,如果权利所有者认为平台内容侵犯了其合法权益,可以通过本平台提供的版权投诉渠道提出投诉。一经核实,我们将立即采取措施删除/下架/断链等措施。
我已知晓