Code review is a critical step in modern software quality assurance, yet it is vulnerable to human biases. Previous studies have clarified the extent of the problem, particularly regarding biases against the authors of code,but no consensus understanding has emerged. Advances in medical imaging are increasingly applied to software engineering, supporting grounded neurobiological explorations of computing activities, including the review, reading, and writing of source code. In this paper, we present the results of a controlled experiment using both medical imaging and also eye tracking to investigate the neurological correlates of biases and differences between genders of humans and machines (e.g., automated program repair tools) in code review. We find that men and women conduct code reviews differently, in ways that are measurable and supported by behavioral, eye-tracking and medical imaging data. We also find biases in how humans review code as a function of its apparent author, when controlling for code quality. In addition to advancing our fundamental understanding of how cognitive biases relate to the code review process, the results may inform subsequent training and tool design to reduce bias.
代码审查是现代软件质量保证中的关键步骤,但它容易受到人为偏见的影响。先前的研究已经阐明了问题的严重程度,特别是针对代码作者的偏见,但尚未形成共识性的理解。医学成像的进步越来越多地应用于软件工程,支持对计算活动进行有根据的神经生物学探索,包括对源代码的审查、阅读和编写。在本文中,我们展示了一项对照实验的结果,该实验同时使用医学成像和眼动追踪来研究代码审查中人类和机器(例如自动化程序修复工具)的性别之间的偏见及差异的神经关联。我们发现男性和女性进行代码审查的方式不同,这种不同是可测量的,并且有行为、眼动追踪和医学成像数据作为支撑。我们还发现,在控制代码质量的情况下,人类根据代码表面上的作者来审查代码的方式存在偏见。除了增进我们对认知偏见与代码审查过程之间关系的基本理解之外,这些结果可能会为后续的培训和工具设计提供信息,以减少偏见。