Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts

合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来

基本信息

项目摘要

National efforts to digitize natural history collections have transformed previously siloed, unstandardized resources into a networked, openly available information nexus usable to meet grand scientific and societal challenges. Despite these enormous strides, major bottlenecks in this digitization process still exist, especially in areas where automation approaches have been most challenging. In particular, capturing analog specimen data into digital format and converting text descriptions of collecting locations into mappable geocoordinates, have remained boutique efforts. Because of these bottlenecks, as many as 91% of digitized specimens are missing key elements that hamper ability to use these specimen records more effectively. This project will develop key workflows to dramatically increase the speed at which specimen data can be captured and made available broadly to data providers and consumers. These workflows include novel approaches that use both computer and human intelligence to advance our ability to capture specimen information. One key workflow focuses on the challenge of automated conversion of imaged specimen labels into properly formatted and usable digital text. Critical to the success of this workflow are human validation checkpoints that will be implemented using a popular citizen science platform, Notes from Nature. A second workflow focuses on new tools that take advantage of previous efforts to assign mappable coordinates based on specimen collection location to automatically add such mapping information for specimens missing those data. Finally, this effort will create tools for easy access to these new data in and out of common use databases, making the data immediately available for museum providers and researchers alike. This effort will connect public participation in science to these novel tools and technologies. Further, it will train diverse graduate students and undergraduate students in bioinformatics and museum science.This effort has three design goals that together will dramatically reduce the digitization gap in museum specimen data. The first design goal will combine machine learning methods with public participation in scientific research (PPSR) via the successful Notes from Nature (NfN) project to speed up label digitization and facilitate obtaining locality data. A key part of the first design goal utilizes supervised machine learning approaches and object character recognition (OCR) when possible but also includes “humans in the loop” using the NfN platform to gather fast quality feedback from human volunteers at key points. This approach also provides a means to create high-quality training datasets needed for improving automation steps, ultimately further reducing human effort. The second design goal will integrate locality data interpretation through GEOLocate with a Biodiversity Enhanced Locality Service (BELS), which will make it possible to look up pre-existing localities that have been georeferenced using best practices. A third goal is to connect these workflows and services to Symbiota, a community digitization hub, to allow easy inflow and outflow of content back to digitization networks. Providers will be able to easily access new data along with associated metadata about processing steps, all returned using established standards and best practices. The key to this effort will be engagement with the community, including researchers, collections staff, and Zooniverse volunteers. Engagement will focus on virtual training and working with an advisory committee in order to grow capacity and community involvement.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
国家对自然历史馆藏进行数字化的努力已将以前孤立的、非标准化的资源转变为网络化的、公开可用的信息关系,可用于应对重大的科学和社会挑战尽管取得了这些巨大的进步,但数字化过程中的主要瓶颈仍然存在,特别是在自动化领域。特别是,将模拟样本数据捕获为数字格式并将采集地点的文本描述转换为可绘制地图的地理坐标,由于这些瓶颈,这些方法仍然是最困难的。 91% 的数字化样本缺少阻碍更有效地使用这些样本记录的关键元素,该项目将开发关键工作流程,以显着提高捕获样本数据并广泛向数据提供者和消费者提供的速度。使用计算机和人类智能来提高我们捕获样本信息的能力的一个关键工作流程侧重于将成像样本标签自动转换为格式正确且可用的数字文本的挑战,该工作流程成功的关键是人工验证检查点。将要实施的第二个工作流程侧重于利用流行的公民科学平台“Notes from Nature”的新工具,这些工具利用之前根据标本采集位置分配可映射坐标坐标的努力,为缺少这些数据的标本自动添加此类映射信息。将创建工具,以便轻松访问通用数据库中的这些新数据,使博物馆提供者和研究人员可以立即使用这些数据。这项工作将把公众对科学的参与与这些新颖的工具和技术联系起来。生物信息学领域的多样化研究生和本科生博物馆科学。这项工作有三个设计目标,这三个目标共同将大大缩小博物馆标本数据的数字化差距。第一个设计目标将通过成功的 Notes from Nature (NfN) 项目将机器学习方法与公众参与科学研究 (PPSR) 结合起来。加快标签数字化并促进获取位置数据是第一个设计目标的一个关键部分,在可能的情况下利用监督机器学习方法和对象字符识别 (OCR),但还包括使用 NfN 平台的“人在循环中”来快速收集质量数据。来自人类的反馈这种方法还提供了一种创建改进自动化步骤所需的高质量培训数据集的方法,最终进一步减少了人力工作,第二个设计目标是将通过 GEOLocate 与生物多样性增强位置服务 (BELS) 集成位置数据解释。 ,这将使查找已使用最佳实践进行地理参考的预先存在的地点成为可能。第三个目标是将这些工作流程和服务连接到社区数字化中心 Symbiota,以允许轻松的流入和流出。内容提供者将能够轻松访问新数据以及有关处理步骤的相关元数据,所有这些数据都使用既定标准和最佳实践返回,这一努力的关键将是与社区(包括研究人员、馆藏工作人员)的合作。和 Zooniverse 志愿者的参与将侧重于虚拟培训和与咨询委员会合作,以提高能力和社区参与。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优点和更广泛的影响审查进行评估,被认为值得支持。标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Elizabeth Leger其他文献

Using Digitized Museum Collections to Investigate Population Variation in Plants
利用数字化博物馆藏品调查植物种群变化
  • DOI:
    10.1525/abt.2021.83.4.235
  • 发表时间:
    2021-05-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Candice Guy;C. Scholl;Elizabeth Leger
  • 通讯作者:
    Elizabeth Leger

Elizabeth Leger的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Elizabeth Leger', 18)}}的其他基金

CSBR: Natural History: Preserving and ensuring access to critical biological collections in an emerging museum at the University of Nevada, Reno
CSBR:自然历史:在内华达大学里诺分校的新兴博物馆中保存并确保可获取重要的生物藏品
  • 批准号:
    1458033
  • 财政年份:
    2015
  • 资助金额:
    $ 46.6万
  • 项目类别:
    Continuing Grant

相似国自然基金

基于肿瘤病理图片的靶向药物敏感生物标志物识别及统计算法的研究
  • 批准号:
    82304250
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
肠道普拉梭菌代谢物丁酸抑制心室肌铁死亡改善老龄性心功能不全的机制研究
  • 批准号:
    82300430
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
社会网络关系对公司现金持有决策影响——基于共御风险的作用机制研究
  • 批准号:
    72302067
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
面向图像目标检测的新型弱监督学习方法研究
  • 批准号:
    62371157
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
面向开放域对话系统信息获取的准确性研究
  • 批准号:
    62376067
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
  • 批准号:
    2027234
  • 财政年份:
    2021
  • 资助金额:
    $ 46.6万
  • 项目类别:
    Standard Grant
Collaborative Research: CIBR: Incorporating Crystallography and Cryo-EM tools into Foldit
合作研究:CIBR:将晶体学和冷冻电镜工具纳入 Foldit
  • 批准号:
    2051282
  • 财政年份:
    2021
  • 资助金额:
    $ 46.6万
  • 项目类别:
    Standard Grant
Collaborative Research: CIBR: The OpenBehavior Project
合作研究:CIBR:开放行为项目
  • 批准号:
    1948181
  • 财政年份:
    2021
  • 资助金额:
    $ 46.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: CIBR: Incorporating Crystallography and Cryo-EM Tools in Foldit
合作研究:CIBR:在 Foldit 中结合晶体学和冷冻电镜工具
  • 批准号:
    2051305
  • 财政年份:
    2021
  • 资助金额:
    $ 46.6万
  • 项目类别:
    Standard Grant
Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
  • 批准号:
    2027228
  • 财政年份:
    2021
  • 资助金额:
    $ 46.6万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了