Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts

合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来

基本信息

  • 批准号:
    2027234
  • 负责人:
  • 金额:
    $ 29.24万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-01-15 至 2024-12-31
  • 项目状态:
    已结题

项目摘要

National efforts to digitize natural history collections have transformed previously siloed, unstandardized resources into a networked, openly available information nexus usable to meet grand scientific and societal challenges. Despite these enormous strides, major bottlenecks in this digitization process still exist, especially in areas where automation approaches have been most challenging. In particular, capturing analog specimen data into digital format and converting text descriptions of collecting locations into mappable geocoordinates, have remained boutique efforts. Because of these bottlenecks, as many as 91% of digitized specimens are missing key elements that hamper ability to use these specimen records more effectively. This project will develop key workflows to dramatically increase the speed at which specimen data can be captured and made available broadly to data providers and consumers. These workflows include novel approaches that use both computer and human intelligence to advance our ability to capture specimen information. One key workflow focuses on the challenge of automated conversion of imaged specimen labels into properly formatted and usable digital text. Critical to the success of this workflow are human validation checkpoints that will be implemented using a popular citizen science platform, Notes from Nature. A second workflow focuses on new tools that take advantage of previous efforts to assign mappable coordinates based on specimen collection location to automatically add such mapping information for specimens missing those data. Finally, this effort will create tools for easy access to these new data in and out of common use databases, making the data immediately available for museum providers and researchers alike. This effort will connect public participation in science to these novel tools and technologies. Further, it will train diverse graduate students and undergraduate students in bioinformatics and museum science.This effort has three design goals that together will dramatically reduce the digitization gap in museum specimen data. The first design goal will combine machine learning methods with public participation in scientific research (PPSR) via the successful Notes from Nature (NfN) project to speed up label digitization and facilitate obtaining locality data. A key part of the first design goal utilizes supervised machine learning approaches and object character recognition (OCR) when possible but also includes “humans in the loop” using the NfN platform to gather fast quality feedback from human volunteers at key points. This approach also provides a means to create high-quality training datasets needed for improving automation steps, ultimately further reducing human effort. The second design goal will integrate locality data interpretation through GEOLocate with a Biodiversity Enhanced Locality Service (BELS), which will make it possible to look up pre-existing localities that have been georeferenced using best practices. A third goal is to connect these workflows and services to Symbiota, a community digitization hub, to allow easy inflow and outflow of content back to digitization networks. Providers will be able to easily access new data along with associated metadata about processing steps, all returned using established standards and best practices. The key to this effort will be engagement with the community, including researchers, collections staff, and Zooniverse volunteers. Engagement will focus on virtual training and working with an advisory committee in order to grow capacity and community involvement.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
国家对自然历史馆藏进行数字化的努力已将以前孤立的、非标准化的资源转变为网络化的、公开可用的信息关系,可用于应对重大的科学和社会挑战尽管取得了这些巨大的进步,但数字化过程中的主要瓶颈仍然存在,特别是在自动化领域。特别是,将模拟样本数据捕获为数字格式并将采集地点的文本描述转换为可绘制地图的地理坐标,由于这些瓶颈,这些方法仍然是最困难的。 91% 的数字化样本缺少阻碍更有效地使用这些样本记录的关键元素,该项目将开发关键工作流程,以显着提高捕获样本数据并广泛向数据提供者和消费者提供的速度。使用计算机和人类智能来提高我们捕获样本信息的能力的一个关键工作流程侧重于将成像样本标签自动转换为格式正确且可用的数字文本的挑战,该工作流程成功的关键是人工验证检查点。将要实施的第二个工作流程侧重于利用流行的公民科学平台“Notes from Nature”的新工具,这些工具利用之前根据标本采集位置分配可映射坐标坐标的努力,为缺少这些数据的标本自动添加此类映射信息。将创建工具,以便轻松访问通用数据库中的这些新数据,使博物馆提供者和研究人员可以立即使用这些数据。这项工作将把公众对科学的参与与这些新颖的工具和技术联系起来。生物信息学领域的多样化研究生和本科生博物馆科学。这项工作有三个设计目标,这三个目标共同将大大缩小博物馆标本数据的数字化差距。第一个设计目标将通过成功的 Notes from Nature (NfN) 项目将机器学习方法与公众参与科学研究 (PPSR) 结合起来。加快标签数字化并促进获取位置数据是第一个设计目标的一个关键部分,在可能的情况下利用监督机器学习方法和对象字符识别 (OCR),但还包括使用 NfN 平台的“人在循环中”来快速收集质量数据。来自人类的反馈这种方法还提供了一种创建改进自动化步骤所需的高质量培训数据集的方法,最终进一步减少了人力工作,第二个设计目标是将通过 GEOLocate 与生物多样性增强位置服务 (BELS) 集成位置数据解释。 ,这将使查找已使用最佳实践进行地理参考的预先存在的地点成为可能。第三个目标是将这些工作流程和服务连接到社区数字化中心 Symbiota,以允许轻松的流入和流出。内容提供者将能够轻松访问新数据以及有关处理步骤的相关元数据,所有这些数据都使用既定标准和最佳实践返回,这一努力的关键将是与社区(包括研究人员、馆藏工作人员)的合作。和 Zooniverse 志愿者的参与将侧重于虚拟培训和与咨询委员会合作,以提高能力和社区参与。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优点和更广泛的影响审查进行评估,被认为值得支持。标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Robert Guralnick其他文献

Robert Guralnick的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Robert Guralnick', 18)}}的其他基金

Collaborative Research: Ranges: Building Capacity to Extend Mammal Specimens from Western North America
合作研究:范围:建设能力以扩展北美西部的哺乳动物标本
  • 批准号:
    2228392
  • 财政年份:
    2023
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
IntBIO Collaborative Research: Assessing drivers of the nitrogen-fixing symbiosis at continental scales
IntBIO 合作研究:评估大陆尺度固氮共生的驱动因素
  • 批准号:
    2316267
  • 财政年份:
    2023
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Standard Grant
Collaborative Research: Ranges: Building Capacity to Extend Mammal Specimens from Western North America
合作研究:范围:建设能力以扩展北美西部的哺乳动物标本
  • 批准号:
    2228392
  • 财政年份:
    2023
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
Collaborative Research: Phenobase: Community, infrastructure, and data for global-scale analyses of plant phenology
合作研究:Phenobase:用于全球范围植物物候分析的社区、基础设施和数据
  • 批准号:
    2223512
  • 财政年份:
    2022
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
Collaborative Research: Origins and drivers of extinction of Caribbean Avifauna
合作研究:加勒比鸟类灭绝的起源和驱动因素
  • 批准号:
    2033905
  • 财政年份:
    2021
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
Collaborative Research: LightningBug, An Integrated Pipeline to Overcome The Biodiversity Digitization Gap
合作研究:LightningBug,克服生物多样性数字化差距的综合管道
  • 批准号:
    2104152
  • 财政年份:
    2021
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
Collaborative Research: LightningBug, An Integrated Pipeline to Overcome The Biodiversity Digitization Gap
合作研究:LightningBug,克服生物多样性数字化差距的综合管道
  • 批准号:
    2104152
  • 财政年份:
    2021
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
Collaborative Research: Genealogy of Odonata (GEODE): Dispersal and color as drivers of 300 million years of global dragonfly evolution
合作研究:蜻蜓目 (GEODE) 谱系:传播和颜色是 3 亿年全球蜻蜓进化的驱动力
  • 批准号:
    2002457
  • 财政年份:
    2020
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
Cohomology and Representations of Finite and Algebraic Groups with Applications
有限代数群的上同调和表示及其应用
  • 批准号:
    1901595
  • 财政年份:
    2019
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
IIBR RoL: Collaborative Research: A Rules Of Life Engine (RoLE) Model to Uncover Fundamental Processes Governing Biodiversity
IIBR RoL:协作研究:揭示生物多样性基本过程的生命规则引擎 (RoLE) 模型
  • 批准号:
    1927286
  • 财政年份:
    2019
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Standard Grant

相似国自然基金

IGF-1R调控HIF-1α促进Th17细胞分化在甲状腺眼病发病中的机制研究
  • 批准号:
    82301258
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
CTCFL调控IL-10抑制CD4+CTL旁观者激活促口腔鳞状细胞癌新辅助免疫治疗抵抗机制研究
  • 批准号:
    82373325
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
RNA剪接因子PRPF31突变导致人视网膜色素变性的机制研究
  • 批准号:
    82301216
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
血管内皮细胞通过E2F1/NF-kB/IL-6轴调控巨噬细胞活化在眼眶静脉畸形中的作用及机制研究
  • 批准号:
    82301257
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于多元原子间相互作用的铝合金基体团簇调控与强化机制研究
  • 批准号:
    52371115
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
  • 批准号:
    2027241
  • 财政年份:
    2021
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Standard Grant
Collaborative Research: CIBR: Incorporating Crystallography and Cryo-EM tools into Foldit
合作研究:CIBR:将晶体学和冷冻电镜工具纳入 Foldit
  • 批准号:
    2051282
  • 财政年份:
    2021
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Standard Grant
Collaborative Research: CIBR: The OpenBehavior Project
合作研究:CIBR:开放行为项目
  • 批准号:
    1948181
  • 财政年份:
    2021
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Continuing Grant
Collaborative Research: CIBR: Incorporating Crystallography and Cryo-EM Tools in Foldit
合作研究:CIBR:在 Foldit 中结合晶体学和冷冻电镜工具
  • 批准号:
    2051305
  • 财政年份:
    2021
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Standard Grant
Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
  • 批准号:
    2027228
  • 财政年份:
    2021
  • 资助金额:
    $ 29.24万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了