Doctoral Dissertation Research: Evaluating the Promise and Pitfalls of Benchmarking in Machine Learning Research
博士论文研究:评估机器学习研究中基准测试的前景和陷阱
基本信息
- 批准号:2124685
- 负责人:
- 金额:$ 2万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-08-01 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).The scientific and commercial success of machine learning (ML) has spurred government and corporate sponsors to invest billions of dollars in machine learning research. Despite this massive investment, there is limited quantitative research on how the ML field measures progress: a process called “benchmarking.” Benchmarking is the act of comparing algorithms on a quantitative metric after training them on the same benchmark dataset. Benchmarks organize ML researchers around common tasks. Achieving “state of the art” performance on an important benchmark can spark new research trajectories and advance careers: consider the 2012 success of “AlexNet” in a prominent computer vision task, which helped to launch current interest in deep learning. However, the practice of benchmarking has already engendered criticism that this near-ubiquitous research culture does not push the field towards socially beneficial outcomes, and leads to overinvestment in methods that maximize performance on academic datasets but are environmentally unsustainable or harm the public when used in the real world. This dissertation research will provide a comprehensive analysis of the strengths and weaknesses of benchmarking practices with respect to several public aims: accelerating innovation in science, increasing equity within the field, and promoting ethical research (i.e., an orientation toward research that benefits society and avoids harms). By blending sociological analysis, computational methods for extracting and analyzing benchmarking data from thousands of papers, and in-depth qualitative interviews, this research will produce an understanding of benchmarking culture in ML research that combines breadth and quantitative rigor with depth and interpretive nuance. This project has significant implications for government and corporate funders, researchers, and society more broadly. The dissertation consists of three subprojects. The first subproject explores evidence that benchmarking culture has stymied innovation by favoring utilization of the same datasets across multiple tasks and by incentivizing researchers to underinvest on nascent benchmarks and overinvest on mature ones. The second subproject explores how patterns in the adoption of benchmarks and rewards for state-of-the-art performance interact with status and resources to create inequities in the field. It tests the hypothesis that high-status researchers and institutions have disproportionate power to set the field’s research agenda by introducing benchmarks, while garnering disproportionate citations for state-of-the-art achievements. Both of these phenomena have the potential to create a “Matthew Effect” that disadvantages under-represented and under-resourced researchers/institutions. These subprojects use network science, natural language processing, and manual coding to create a large dataset of benchmarks and progress on those benchmarks across multiple ML task communities. The third subproject consists of qualitative interviews with ML researchers across career stages and expertise to gain first-hand perspectives on benchmarking culture and assess reforms to improve research ethics and societal outcomes.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项是根据2021年《美国救援计划法》(公法117-2)全部或部分资助的。机器学习的科学和商业成功(ML)促使政府和企业赞助商投资数十亿美元在机器学习研究中。尽管进行了大量投资,但关于ML场如何衡量进展的方法仍有有限的定量研究:一个称为“基准测试”的过程。基准测试是在同一基准数据集上训练它们后在定量度量标准上比较算法的行为。基准测试围绕常见任务组织ML研究人员。在重要的基准上实现“艺术状态”的表现可以激发新的研究轨迹和进步职业:考虑2012年“ Alexnet”在一项杰出的计算机视觉任务中取得的成功,这有助于引发当前对深度学习的兴趣。但是,基准测试的实践已经引起了批评主义,即这种近乎语调的研究文化不会将领域推向社会有益的成果,并导致过度投资,从而使学术数据集中的绩效最大化,但在现实世界中使用时在环境上是不可持续的或损害公众。这项论文研究将对基准实践对几个公共目标的基准实践的优势和劣势进行全面分析:加速科学的创新,增加现场的平等以及促进道德研究(即,对研究的研究取向使社会受益并避免危害的研究)。通过融合社会学分析,用于提取和分析数千文论文的基准数据的计算方法以及深入的定性访谈,这项研究将产生对ML研究中基准培养的理解,将广度和定量严格的严格性与层面和诠释性的细微差别相结合。该项目对政府和企业基金,研究人员和社会具有重大影响。论文由三个次要注射组成。第一个副标题探讨了基准文化通过在多个任务中利用相同数据集的利用,并增加研究人员对新生基准的投资和对成熟的基准投资的投资,从而阻碍了创新的证据。第二个子项目探讨了采用基准和最先进绩效的奖励的模式如何与状态和资源相互作用,以在现场造成不平等。它检验了以下假设:高地位的研究人员和机构通过引入基准测试来确定该领域的研究议程的权力不成比例,同时吸引了不成比例的最先进成就。这两种现象都有可能创造出“马修效应”,而这些效应不利,代表性不足和资源不足的研究人员/机构。这些主题使用网络科学,自然语言处理和手动编码来创建大量基准数据集,并在多个ML任务社区的基准上进行进步。第三个主题项目包括对职业阶段和专业知识的ML研究人员的定性访谈,以获得基准测试文化和评估改革的第一手观点,以改善研究伦理和社会成果。该奖项反映了NSF的法定使命,并被认为是通过基金会的知识分子和更广泛的影响来通过评估来进行评估的支持。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
- DOI:
- 发表时间:2021-12
- 期刊:
- 影响因子:0
- 作者:Bernard Koch;Emily L. Denton;A. Hanna;J. Foster
- 通讯作者:Bernard Koch;Emily L. Denton;A. Hanna;J. Foster
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jacob Foster其他文献
A Decade of Police Use of Deadly Force Research (2011–2020)
警察使用致命武力研究十年(2011-2020)
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:1.5
- 作者:
Daniela Oramas Mora;William Terrill;Jacob Foster - 通讯作者:
Jacob Foster
Businesses, Places, and Homicide: A Preliminary Empirical Examination
企业、场所和凶杀案:初步实证检验
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:1.6
- 作者:
David C. Lane;K. Williams;Jacob Foster - 通讯作者:
Jacob Foster
Histiocytoid cardiomyopathy presenting as sudden death in an 18-month-old infant.
组织细胞样心肌病表现为 18 个月大婴儿猝死。
- DOI:
10.1007/s12024-023-00730-2 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Jacob Foster;Sarah Parsons - 通讯作者:
Sarah Parsons
Going above and beyond: assessing the characteristics of officers who complete additional in-service training
超越:评估完成额外在职培训的官员的特征
- DOI:
10.1080/15614263.2022.2152028 - 发表时间:
2022 - 期刊:
- 影响因子:1.8
- 作者:
Logan J. Somers;Jacob Foster - 通讯作者:
Jacob Foster
The "autopsy" enigma: etymology, related terms and unambiguous alternatives.
- DOI:
10.1007/s12024-023-00729-9 - 发表时间:
2023-10 - 期刊:
- 影响因子:0
- 作者:
Jacob Foster - 通讯作者:
Jacob Foster
Jacob Foster的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
细粒度与个性化的学生议论文评价方法研究
- 批准号:62306145
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于社交媒体用户画像的科学论文传播模式与影响力性质研究
- 批准号:72304274
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于科学论文论证结构的可循证领域知识体系构建研究
- 批准号:72304137
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向论文引用与科研合作的"科学学"规律中的国别特征研究
- 批准号:72374173
- 批准年份:2023
- 资助金额:41 万元
- 项目类别:面上项目
基于深度语义理解的生物医学论文临床转化分析研究
- 批准号:72204090
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
相似海外基金
Doctoral Dissertation Research: How New Legal Doctrine Shapes Human-Environment Relations
博士论文研究:新法律学说如何塑造人类与环境的关系
- 批准号:
2315219 - 财政年份:2024
- 资助金额:
$ 2万 - 项目类别:
Standard Grant
Doctoral Dissertation Research: Determinants of social meaning
博士论文研究:社会意义的决定因素
- 批准号:
2336572 - 财政年份:2024
- 资助金额:
$ 2万 - 项目类别:
Standard Grant
Doctoral Dissertation Research: Assessing the chewing function of the hyoid bone and the suprahyoid muscles in primates
博士论文研究:评估灵长类动物舌骨和舌骨上肌的咀嚼功能
- 批准号:
2337428 - 财政年份:2024
- 资助金额:
$ 2万 - 项目类别:
Standard Grant
Doctoral Dissertation Research: Aspect and Event Cognition in the Acquisition and Processing of a Second Language
博士论文研究:第二语言习得和处理中的方面和事件认知
- 批准号:
2337763 - 财政年份:2024
- 资助金额:
$ 2万 - 项目类别:
Standard Grant
Doctoral Dissertation Research: Renewable Energy Transition and Economic Growth
博士论文研究:可再生能源转型与经济增长
- 批准号:
2342813 - 财政年份:2024
- 资助金额:
$ 2万 - 项目类别:
Standard Grant