Creating a Data Quality Control Framework for Producing New Personnel-Based S&E Indicators

创建数据质量控制框架以产生新的基于人员的S

基本信息

项目摘要

Science and Engineering (S&E) research generates substantial returns in terms of human knowledge, social and economic benefits. Nations around the globe compete for scientific and technological leadership through substantial research funding and focused efforts to develop highly trained workforces. To date, efforts to measure and understand national and international trends in S&E and to assess global strengths and weaknesses, have largely relied on the analysis of documents such as patents and publications using big, growing datasets. But this approach too often misses or mistakenly identifies the people and teams who do productive science and engineering work. Robust indicators of the size, composition, collaboration, and mobility of the S&E workforce within and across nations are largely missing from analysis and reporting. These key aspects of the national and international scientific enterprise are poorly captured by data analysis focused on documents and citations. To address this problem, this project develops person level workforce and collaboration measures that could add granularity to comparisons of international S&E competitiveness and lead to new policy insights for S&E workforce training, hiring, and retention for a nation's future. The prerequisite of such person level indicators is that individual researchers who appear in multiple bibliographic datasets are correctly identified and linked. Effective identification and linkage of authors based on their names is daunting because names are often ambiguous. This is particularly the case for Asian names, which poses a significant problem as Asian researchers play an increasingly important role in many fields of research. This project addresses the challenge of systematically and routinely disambiguating names in big bibliographic datasets using a new Automated and Stratified Entity Disambiguation framework. Core datasets for this effort are derived using a new method that relies on multiple data fields and an iterative process to automatically create disambiguated datasets that can be used to train artificial intelligence tools to conduct robust person level analysis. To improve disambiguation accuracy, name instances are stratified into two groups according to name-ethnicity and disambiguated separately to produce optimal models learned on the automatically generated truth data. Based on the disambiguated data, this project develops new person-level S&E indicators that characterize the landscape and trends of the international S&E research workforce across all science and engineering fields. The new big data tools for automatic disambiguation at scale will be documented and released publicly to enable expansion, validation, and reuse by the science community as well as science of science policy researchers.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
科学与工程(S&E)研究在人类知识,社会和经济利益方面产生了可观的回报。全球各国通过大量的研究资金竞争科学和技术领导力,并着重努力培养训练有素的劳动力。迄今为止,衡量和理解S&E的国家和国际趋势并评估全球优势和劣势的努力在很大程度上依赖于使用大型,不断增长的数据集的专利和出版物等文档的分析。 但是这种方法常常错过或错误地识别从事生产科学和工程工作的人和团队。在分析和报告中,在很大程度上缺少了S&E劳动力的规模,组成,协作和流动性的强大指标。国家和国际科学企业的这​​些关键方面是通过针对文件和引用的数据分析捕获的。为了解决这个问题,该项目开发了人员级别的劳动力和协作措施,这些措施可以增加国际S&E竞争力的比较,并为S&E劳动力培训,招聘和保留一个国家未来的新政策见解。这种人级别指标的先决条件是正确识别和链接了多个书目数据集中的个体研究人员。有效的识别和基于其名称的链接是令人生畏的,因为名称通常是模棱两可的。亚洲名字尤其如此,因为亚洲研究人员在许多研究领域都起着越来越重要的作用,这是一个重大的问题。该项目使用新的自动化和分层的实体DISAIMAIMAIN框架解决了系统和常规歧义名称的挑战。这项工作的核心数据集是使用一种依赖多个数据字段的新方法来得出的,并且迭代过程自动创建可使用歧义的数据集,该数据集可用于训练人工智能工具以进行强大的人级分析。为了提高歧义准确性,根据名称种族将名称实例分为两组,并分别歧义,以产生在自动生成的真实数据上学习的最佳模型。基于歧义的数据,该项目开发了新的人级的S&E指标,这些指标表征了所有科学和工程领域的国际S&E研究人员的景观和趋势。将记录并公开发布用于自动歧义的新的大数据工具,以实现科学界的扩展,验证和重复使用,以及科学科学政策研究人员。该奖项反映了NSF的法定任务,并被认为值得支持。通过使用基金会的智力优点和更广泛影响的评论标准进行评估。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Model Reuse in Machine Learning for Author Name Disambiguation: An Exploration of Transfer Learning
  • DOI:
    10.1109/access.2020.3031112
  • 发表时间:
    2020-10
  • 期刊:
  • 影响因子:
    3.9
  • 作者:
    Jinseok Kim;Jason Owen-Smith
  • 通讯作者:
    Jinseok Kim;Jason Owen-Smith
ORCID-linked labeled data for evaluating author name disambiguation at scale
  • DOI:
    10.1007/s11192-020-03826-6
  • 发表时间:
    2021-02-11
  • 期刊:
  • 影响因子:
    3.9
  • 作者:
    Kim, Jinseok;Owen-Smith, Jason
  • 通讯作者:
    Owen-Smith, Jason
Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation
  • DOI:
    10.1177/01655515211018171
  • 发表时间:
    2021-05
  • 期刊:
  • 影响因子:
    2.4
  • 作者:
    Jinseok Kim;Jenna Kim;Jinmo Kim
  • 通讯作者:
    Jinseok Kim;Jenna Kim;Jinmo Kim
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jason Owen-Smith其他文献

Jason Owen-Smith的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jason Owen-Smith', 18)}}的其他基金

Collaborative Research: RUI: HNDS-R: Stepping out of flatland: Complex networks, topological data analysis, and the progress of science
合作研究:RUI:HNDS-R:走出平地:复杂网络、拓扑数据分析和科学进步
  • 批准号:
    2318170
  • 财政年份:
    2023
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Standard Grant
Collaborative Research: Industries of Ideas: A prototype system for measuring the effects of research investments on regional firms and jobs
协作研究:创意产业:衡量研究投资对区域企业和就业影响的原型系统
  • 批准号:
    2332571
  • 财政年份:
    2023
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Cooperative Agreement
ECR: BCSER: IRM: Building Big Data Capacity for Education and Social Science Research Communities Using Restricted Administrative Data
ECR:BCSER:IRM:使用受限管理数据为教育和社会科学研究界构建大数据能力
  • 批准号:
    1937251
  • 财政年份:
    2020
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Standard Grant
Collaborative Research: Impacts of Hard/Soft Skills on STEM Workforce Trajectories
合作研究:硬/软技能对 STEM 劳动力轨迹的影响
  • 批准号:
    1954981
  • 财政年份:
    2020
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: New Insights into STEM Pathways: The Role of Peers, Networks, and Demand.
协作研究:STEM 途径的新见解:同行、网络和需求的作用。
  • 批准号:
    1760609
  • 财政年份:
    2018
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Standard Grant
Medical Decision-Making and Network Assembly Mechanisms in Inpatient Surgical Care
住院外科护理中的医疗决策和网络组装机制
  • 批准号:
    1560987
  • 财政年份:
    2016
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: STEM Training, Employment in Industry, and Entrepreneurship
合作研究:STEM 培训、工业就业和创业
  • 批准号:
    1535370
  • 财政年份:
    2015
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Standard Grant
Building Community and a New Data Infrastructure for Science Policy
为科学政策建立社区和新的数据基础设施
  • 批准号:
    1262447
  • 财政年份:
    2013
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Standard Grant
Estimating the Economic and Scientific Impact of Federal R&D Spending by Universities
估计联邦 R 的经济和科学影响
  • 批准号:
    1158711
  • 财政年份:
    2012
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Standard Grant
From Bank to Bench to Breakthrough: Selection, Access, and Use of Human Stem Cell Research Methods
从银行到实验室再到突破:人类干细胞研究方法的选择、获取和使用
  • 批准号:
    0949708
  • 财政年份:
    2009
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Standard Grant

相似国自然基金

面向卫星重力数据反演高精度地表质量变化模型的约束模型构建及优化
  • 批准号:
    42304097
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
数字化转型助力银行高质量发展的机制研究:基于数据治理与股东治理视角
  • 批准号:
    72373038
  • 批准年份:
    2023
  • 资助金额:
    41 万元
  • 项目类别:
    面上项目
基于青稞酒质量控制的光谱数据融合分析方法研究
  • 批准号:
    22363010
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
数智化情境下数据要素立法保护与企业高质量发展
  • 批准号:
    72374201
  • 批准年份:
    2023
  • 资助金额:
    41 万元
  • 项目类别:
    面上项目
基于志愿者画像的OpenSteetMap建筑物数据质量评估
  • 批准号:
    42361069
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目

相似海外基金

HDR DSC: Collaborative Research: Creating and Integrating Data Science Corps to Improve the Quality of Life in Urban Areas
HDR DSC:协作研究:创建和整合数据科学团队以提高城市地区的生活质量
  • 批准号:
    2321574
  • 财政年份:
    2023
  • 资助金额:
    $ 46.2万
  • 项目类别:
    Standard Grant
Pilot Project 1: Creating Bridges to Reproductive Health Care for Rural Adolescent and Young Adult Cancer Survivors
试点项目 1:为农村青少年和青年癌症幸存者搭建生殖保健桥梁
  • 批准号:
    10762146
  • 财政年份:
    2023
  • 资助金额:
    $ 46.2万
  • 项目类别:
Pilot Project 1: Creating Bridges to Reproductive Health Care for Rural Adolescent and Young Adult Cancer Survivors
试点项目 1:为农村青少年和青年癌症幸存者搭建生殖保健桥梁
  • 批准号:
    10762275
  • 财政年份:
    2023
  • 资助金额:
    $ 46.2万
  • 项目类别:
CATCH: Creating Access to Transplant for Candidates who are High Risk
CATCH:为高风险候选人创造移植机会
  • 批准号:
    10430882
  • 财政年份:
    2022
  • 资助金额:
    $ 46.2万
  • 项目类别:
A theory-based practice guide for creating patient education
创建患者教育的基于理论的实践指南
  • 批准号:
    10553227
  • 财政年份:
    2022
  • 资助金额:
    $ 46.2万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了