Towards the Building of a Comprehensive Searchable Biological Experiment Database

建立综合可检索的生物实验数据库

基本信息

  • 批准号:
    7314689
  • 负责人:
  • 金额:
    $ 23.01万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2007
  • 资助国家:
    美国
  • 起止时间:
    2007-12-01 至 2009-11-30
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): The rapid growth of the biomedical literature and the expansion in disciplinary biomedical research, heralded by high-throughput genome sciences and technologies, have overwhelmed scientists who attempt to assimilate information necessary for their research. The widespread adoption of title/abstract word searches, such as highly desirable the National Library of Medicine's PubMed system, has provided the first major advance in the way bioscientists find relevant publications since the origin of Index Medicus in 1879 (Hunter and Cohen 2006). The importance of developing valid information retrieval systems for bioscientists has led to the development of information systems worldwide (e.g., Arrowsmith (Smalheiser and Swanson 1998), BioText (Hearst 2003), GeneWays (Friedman et al. 2001; Rzhetsky et al. 2004), iHOP (Hoffmann and Valencia 2005), and BioMedQA (Lee et al. 2006a), and annotated databases (e.g., SWISSPROT, OMIM (Hamosh et al. 2005) and BIND (Alfarano et al. 2005)). However, most of information systems target only text information and fail to provide access to other important data such as images (e.g., figures). More than any other documentation, figures usually represent the "evidence" of discovery in the biomedical literature. Full-text biological articles nearly always incorporate figures/images that are the crucial content of the biomedical literature. Our examination of biological articles in the Proceedings of the National Academy of Sciences (PNAS) revealed the occurrence of 5.2 images per article on average (Yu and Lee 2006a). Biologists need to access image data to validate research facts and to formulate or to test novel research hypotheses. It has been evaluated that textual statements reported in literature frequently are noisy (i.e., containing "false facts") (Krauthammer et al. 2002). Capturing images that are experimental "evidence" to support the textual "fact" will benefit bioscience information systems, databases, and bioscientists. Unfortunately, this wealth of information remains virtually inaccessible without automatic systems to organize these images. We propose the development of advanced natural language processing (NLP) tools to semantically organize images. We hypothesize that text that associated with images semantically entails the image content and natural language processing techniques can be developed to accurately associate the text to their images. Furthermore, we hypothesize that images can be semantically organized by categories specified by standard biological ontology, and that natural language processing approaches can accurately assign the ontological categories to images. Our specific aims are: Aim 1: To develop and evaluate NLP techniques for identifying textual statements that correspond to images in full-text articles. We will develop different approaches for two types of the associations. We will first propose rule-based and statistical approaches to identify the associated text that appears in the full-text articles. We will then develop hybrid approaches to link sentences in abstracts to images in the body of the articles. Aim 2: To develop and evaluate NLP techniques for automatic classification of experimental results into categories (e.g., Western-Blot, PCR verification, etc) specified in the experimental protocol Protocol-Online. We will explore the use of dictionary-based, rule-based, image classification, and machine-learning approaches for accomplishing this aim. Aim 3: To develop and evaluate NLP techniques for automatic assignment of Gene Ontology categories to experiments, which will provide a knowledge-based organization of experiments according to biological properties (e.g., catalytic activity). We will develop statistical and machine-learning approaches for accomplishing this aim. We found that most of the images that appear in full-text biological articles are figure images (Yu and Lee 2006a) and we therefore focus on figure images only in this proposal. The deliverable of Specific Aim 1 will be an effective user-interface BioEx from which bioscientists can access images directly from sentences in the abstracts. BioEx has the promise of improvement over the traditional single-document-per-article format that has dominated bioscience publications since the first scientific article appeared in 1665 (Gross 2002). The deliverables of Specific Aim 2 and 3 will be open-source algorithms and tools that accurately map images to categories specified by the Gene Ontology and the Protocol Online. Those algorithms and tools will enhance bioscience information retrieval, information extraction, summarization, and question answering.
描述(由申请人提供): 在高通量基因组科学和技术的推动下,生物医学文献的快速增长和学科生物医学研究的扩展,让那些试图吸收研究所需信息的科学家不知所措。标题/摘要词搜索的广泛采用,例如非常理想的国家医学图书馆的 PubMed 系统,自 1879 年 Index Medicus 诞生以来,为生物科学家查找相关出版物的方式带来了第一个重大进步(Hunter 和 Cohen 2006)。为生物科学家开发有效的信息检索系统的重要性导致了全球信息系统的发展(例如,Arrowsmith(Smalheiser 和 Swanson 1998)、BioText(Hearst 2003)、GeneWays(Friedman 等人,2001 年;Rzhetsky 等人,2004 年) 、iHOP(Hoffmann 和 Valencia 2005)和 BioMedQA(Lee 等人) al. 2006a)和注释数据库(例如 SWISSPROT、OMIM(Hamosh 等人,2005)和 BIND(Alfarano 等人,2005))。 然而,大多数信息系统仅针对文本信息,无法提供对图像(例如图形)等其他重要数据的访问。与任何其他文献相比,数字通常更能代表生物医学文献中发现的“证据”。全文生物学文章几乎总是包含数字/图像,这是生物医学文献的关键内容。我们对《美国国家科学院院刊》(PNAS) 上的生物学文章的检查显示,每篇文章平均出现 5.2 张图像(Yu 和 Lee 2006a)。生物学家需要访问图像数据来验证研究事实并制定或测试新颖的研究假设。据评估,文献中报告的文本陈述经常是嘈杂的(即包含“虚假事实”)(Krauthammer et al. 2002)。捕获作为实验“证据”的图像来支持文本“事实”将使生物科学信息系统、数据库和生物科学家受益。 不幸的是,如果没有自动系统来组织这些图像,这些丰富的信息实际上仍然无法访问。我们建议开发先进的自然语言处理(NLP)工具来按语义组织图像。我们假设与图像相关的文本在语义上包含图像内容,并且可以开发自然语言处理技术来准确地将文本与其图像相关联。此外,我们假设图像可以按照标准生物本体指定的类别进行语义组织,并且自然语言处理方法可以准确地将本体类别分配给图像。 我们的具体目标是: 目标 1:开发和评估 NLP 技术,用于识别与全文文章中的图像相对应的文本陈述。我们将为两类协会制定不同的方法。我们将首先提出基于规则的统计方法来识别全文文章中出现的相关文本。然后,我们将开发混合方法将摘要中的句子与文章正文中的图像链接起来。 目标 2:开发和评估 NLP 技术,用于将实验结果自动分类为实验方案 Protocol-Online 中指定的类别(例如 Western-Blot、PCR 验证等)。 我们将探索使用基于字典、基于规则、图像分类和机器学习的方法来实现这一目标。 目标 3:开发和评估用于将基因本体类别自动分配给实验的 NLP 技术,这将根据生物特性(例如催化活性)提供基于知识的实验组织。我们将开发统计和机器学习方法来实现这一目标。 我们发现全文生物文章中出现的大多数图像都是图形图像(Yu and Lee 2006a),因此我们在本提案中仅关注图形图像。 Specific Aim 1 的交付成果将是一个有效的用户界面 BioEx,生物科学家可以通过该界面直接从摘要中的句子访问图像。 BioEx 有望改进传统的每篇文章单文档格式,自 1665 年第一篇科学文章出现以来,这种格式一直主导生物科学出版物(Gross 2002)。具体目标 2 和 3 的交付成果将是开源算法和工具,可将图像准确地映射到基因本体和在线协议指定的类别。这些算法和工具将增强生物科学信息检索、信息提取、总结和问题解答。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

HONG YU其他文献

HONG YU的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('HONG YU', 18)}}的其他基金

Social and behavioral determinants of health and Alzheimer’s Disease: Cohort study of the US military veteran population
健康和阿尔茨海默病的社会和行为决定因素:美国退伍军人群体的队列研究
  • 批准号:
    10591049
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
Improving Suicide Prediction using NLP-Extracted Social Determinants of Health
使用 NLP 提取的健康社会决定因素改善自杀预测
  • 批准号:
    10656321
  • 财政年份:
    2020
  • 资助金额:
    $ 23.01万
  • 项目类别:
Improving Suicide Prediction using NLP-Extracted Social Determinants of Health
使用 NLP 提取的健康社会决定因素改善自杀预测
  • 批准号:
    10428629
  • 财政年份:
    2020
  • 资助金额:
    $ 23.01万
  • 项目类别:
Improving Suicide Prediction using NLP-Extracted Social Determinants of Health
使用 NLP 提取的健康社会决定因素改善自杀预测
  • 批准号:
    10251336
  • 财政年份:
    2020
  • 资助金额:
    $ 23.01万
  • 项目类别:
Improving Suicide Prediction using NLP-Extracted Social Determinants of Health
使用 NLP 提取的健康社会决定因素改善自杀预测
  • 批准号:
    10100989
  • 财政年份:
    2020
  • 资助金额:
    $ 23.01万
  • 项目类别:
Resource Curation and Evaluation for EHR Note Comprehension
EHR 笔记理解的资源管理和评估
  • 批准号:
    9925807
  • 财政年份:
    2018
  • 资助金额:
    $ 23.01万
  • 项目类别:
Resource Curation and Evaluation for EHR Note Comprehension
EHR 笔记理解的资源管理和评估
  • 批准号:
    9794757
  • 财政年份:
    2018
  • 资助金额:
    $ 23.01万
  • 项目类别:
Systems for Helping Veterans Comprehend Electronic Health Record Notes
帮助退伍军人理解电子健康记录笔记的系统
  • 批准号:
    9768225
  • 财政年份:
    2015
  • 资助金额:
    $ 23.01万
  • 项目类别:
Systems for Helping Veterans Comprehend Electronic Health Record Notes
帮助退伍军人理解电子健康记录笔记的系统
  • 批准号:
    9894743
  • 财政年份:
    2015
  • 资助金额:
    $ 23.01万
  • 项目类别:
EHR Anticoagulants Pharmacovigilance
EHR 抗凝剂药物警戒
  • 批准号:
    9190384
  • 财政年份:
    2014
  • 资助金额:
    $ 23.01万
  • 项目类别:

相似国自然基金

减少编程错误:基于认证内核的全新的快捷依赖类型PiSigma高级编程语言开发
  • 批准号:
    61070023
  • 批准年份:
    2010
  • 资助金额:
    30.0 万元
  • 项目类别:
    面上项目

相似海外基金

Accelerating genomic analysis for time critical clinical applications
加速时间紧迫的临床应用的基因组分析
  • 批准号:
    10593480
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
Exploratory Research Project - ADAPT
探索性研究项目 - ADAPT
  • 批准号:
    10577122
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
Achieving Equity through SocioCulturally-informed, Digitally-Enabled Cancer Pain managemeNT” (ASCENT) Clinical Trial
通过社会文化知情、数字化的癌症疼痛管理 NT™ (ASCENT) 临床试验实现公平
  • 批准号:
    10539159
  • 财政年份:
    2022
  • 资助金额:
    $ 23.01万
  • 项目类别:
Simultaneous MRI/US for real-time liver ablation guidance and confirmation
同步 MRI/US 用于实时肝脏消融指导和确认
  • 批准号:
    10677721
  • 财政年份:
    2022
  • 资助金额:
    $ 23.01万
  • 项目类别:
Fast, large area, multiphoton exoscope (FLAME) for improving early detection of melanoma
快速、大面积、多光子外窥镜 (FLAME) 用于改善黑色素瘤的早期检测
  • 批准号:
    10365803
  • 财政年份:
    2022
  • 资助金额:
    $ 23.01万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了