Towards the Building of a Comprehensive Searchable Biological Experiment Database

建立综合可检索的生物实验数据库

基本信息

  • 批准号:
    7314689
  • 负责人:
  • 金额:
    $ 23.01万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2007
  • 资助国家:
    美国
  • 起止时间:
    2007-12-01 至 2009-11-30
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): The rapid growth of the biomedical literature and the expansion in disciplinary biomedical research, heralded by high-throughput genome sciences and technologies, have overwhelmed scientists who attempt to assimilate information necessary for their research. The widespread adoption of title/abstract word searches, such as highly desirable the National Library of Medicine's PubMed system, has provided the first major advance in the way bioscientists find relevant publications since the origin of Index Medicus in 1879 (Hunter and Cohen 2006). The importance of developing valid information retrieval systems for bioscientists has led to the development of information systems worldwide (e.g., Arrowsmith (Smalheiser and Swanson 1998), BioText (Hearst 2003), GeneWays (Friedman et al. 2001; Rzhetsky et al. 2004), iHOP (Hoffmann and Valencia 2005), and BioMedQA (Lee et al. 2006a), and annotated databases (e.g., SWISSPROT, OMIM (Hamosh et al. 2005) and BIND (Alfarano et al. 2005)). However, most of information systems target only text information and fail to provide access to other important data such as images (e.g., figures). More than any other documentation, figures usually represent the "evidence" of discovery in the biomedical literature. Full-text biological articles nearly always incorporate figures/images that are the crucial content of the biomedical literature. Our examination of biological articles in the Proceedings of the National Academy of Sciences (PNAS) revealed the occurrence of 5.2 images per article on average (Yu and Lee 2006a). Biologists need to access image data to validate research facts and to formulate or to test novel research hypotheses. It has been evaluated that textual statements reported in literature frequently are noisy (i.e., containing "false facts") (Krauthammer et al. 2002). Capturing images that are experimental "evidence" to support the textual "fact" will benefit bioscience information systems, databases, and bioscientists. Unfortunately, this wealth of information remains virtually inaccessible without automatic systems to organize these images. We propose the development of advanced natural language processing (NLP) tools to semantically organize images. We hypothesize that text that associated with images semantically entails the image content and natural language processing techniques can be developed to accurately associate the text to their images. Furthermore, we hypothesize that images can be semantically organized by categories specified by standard biological ontology, and that natural language processing approaches can accurately assign the ontological categories to images. Our specific aims are: Aim 1: To develop and evaluate NLP techniques for identifying textual statements that correspond to images in full-text articles. We will develop different approaches for two types of the associations. We will first propose rule-based and statistical approaches to identify the associated text that appears in the full-text articles. We will then develop hybrid approaches to link sentences in abstracts to images in the body of the articles. Aim 2: To develop and evaluate NLP techniques for automatic classification of experimental results into categories (e.g., Western-Blot, PCR verification, etc) specified in the experimental protocol Protocol-Online. We will explore the use of dictionary-based, rule-based, image classification, and machine-learning approaches for accomplishing this aim. Aim 3: To develop and evaluate NLP techniques for automatic assignment of Gene Ontology categories to experiments, which will provide a knowledge-based organization of experiments according to biological properties (e.g., catalytic activity). We will develop statistical and machine-learning approaches for accomplishing this aim. We found that most of the images that appear in full-text biological articles are figure images (Yu and Lee 2006a) and we therefore focus on figure images only in this proposal. The deliverable of Specific Aim 1 will be an effective user-interface BioEx from which bioscientists can access images directly from sentences in the abstracts. BioEx has the promise of improvement over the traditional single-document-per-article format that has dominated bioscience publications since the first scientific article appeared in 1665 (Gross 2002). The deliverables of Specific Aim 2 and 3 will be open-source algorithms and tools that accurately map images to categories specified by the Gene Ontology and the Protocol Online. Those algorithms and tools will enhance bioscience information retrieval, information extraction, summarization, and question answering.
描述(由申请人提供): 在高通量基因组科学和技术中,生物医学文献的快速增长和纪律生物医学研究的扩展使他们不堪重负的科学家,他们试图吸收其研究所必需的信息。广泛采用标题/抽象单词搜索,例如高度理想的国家医学图书馆的PubMed系统,它为生物科学家寻找相关出版物的首个主要进步提供了自1879年Index Medicus起源以来(Hunter and Cohen 2006)。为生物科学家开发有效的信息检索系统的重要性导致了全球信息系统的发展(例如,Arrowsmith(Smalheiser和Swanson和Swanson 1998),Biotext(Hearst 2003),Geneways,Geneways(Friedman等,2001; Rzhetsky etel。2001; Rzhetsky etal。2004),Ihopmann and biecia and biencia and biecia and and and and and and and and and and and and and and and。 2006a)和注释的数据库(例如Swissprot,Omim(Hamosh等,2005)和Bind(Alfarano等,2005))。 但是,大多数信息系统仅针对文本信息,而无法访问其他重要数据,例如图像(例如图形)。数字比任何其他文档都多,通常代表生物医学文献中发现的“证据”。全文生物学文章几乎总是结合了生物医学文献中至关重要的内容的图像/图像。我们在美国国家科学院会议录中对生物学文章(PNAS)的研究表明,平均每篇文章发生了5.2张图像(Yu and Lee 2006a)。生物学家需要访问图像数据以验证研究事实并制定或检验新的研究假设。已经评估了文献中报道的文本陈述经常是嘈杂的(即包含“错误的事实”)(Krauthammer等,2002)。捕获具有实验性“证据”的图像来支持文本“事实”将使生物科学信息系统,数据库和生物科学家受益。 不幸的是,如果没有自动系统来组织这些图像,这些信息实际上仍然无法访问。我们建议开发先进的自然语言处理(NLP)工具来组织图像。我们假设与图像相关联的文本需要开发图像内容和自然语言处理技术,以将文本准确地关联到其图像。此外,我们假设图像可以按照标准生物本体论规定的类别进行语义组织,并且自然语言处理方法可以准确地将本体论类别分配给图像。 我们的具体目的是: 目标1:开发和评估NLP技术,以识别与全文文章中图像相对应的文本语句。我们将针对两种类型的关联开发不同的方法。我们将首先提出基于规则的统计方法,以识别全文文章中出现的相关文本。然后,我们将开发混合方法,以将抽象的句子与文章正文中的图像联系起来。 目标2:开发和评估NLP技术,以自动将实验结果分类为类别(例如Western-slot,PCR验证等)。 我们将探讨使用基于字典的,基于规则的图像分类以及实现此目标的机器学习方法的使用。 目标3:开发和评估NLP技术以自动分配基因本体学类别为实验,该技术将根据生物学特性(例如催化活性)提供基于知识的实验组织。我们将开发统计和机器学习方法来实现这一目标。 我们发现,全文生物文章中出现的大多数图像都是图像图像(Yu and Lee 2006a),因此我们仅在此提案中专注于图像。特定AIM 1的可交付方式将是一个有效的用户界面Bioex,生物科学家可以从摘要中直接访问图像。自从第一篇科学文章发表于1665年(Gross 2002)以来,Bioex有望改善传统的单一文档格式,该格式一直主导了生物科学出版物(Gross 2002)。特定目标2和3的可交付成果将是开源算法和工具,可准确地将图像映射到基因本体论和在线协议指定的类别。这些算法和工具将增强生物科学信息检索,信息提取,摘要和问题答案。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

HONG YU其他文献

HONG YU的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('HONG YU', 18)}}的其他基金

Social and behavioral determinants of health and Alzheimer’s Disease: Cohort study of the US military veteran population
健康和阿尔茨海默病的社会和行为决定因素:美国退伍军人群体的队列研究
  • 批准号:
    10591049
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
Improving Suicide Prediction using NLP-Extracted Social Determinants of Health
使用 NLP 提取的健康社会决定因素改善自杀预测
  • 批准号:
    10656321
  • 财政年份:
    2020
  • 资助金额:
    $ 23.01万
  • 项目类别:
Improving Suicide Prediction using NLP-Extracted Social Determinants of Health
使用 NLP 提取的健康社会决定因素改善自杀预测
  • 批准号:
    10428629
  • 财政年份:
    2020
  • 资助金额:
    $ 23.01万
  • 项目类别:
Improving Suicide Prediction using NLP-Extracted Social Determinants of Health
使用 NLP 提取的健康社会决定因素改善自杀预测
  • 批准号:
    10251336
  • 财政年份:
    2020
  • 资助金额:
    $ 23.01万
  • 项目类别:
Improving Suicide Prediction using NLP-Extracted Social Determinants of Health
使用 NLP 提取的健康社会决定因素改善自杀预测
  • 批准号:
    10100989
  • 财政年份:
    2020
  • 资助金额:
    $ 23.01万
  • 项目类别:
Resource Curation and Evaluation for EHR Note Comprehension
EHR 笔记理解的资源管理和评估
  • 批准号:
    9925807
  • 财政年份:
    2018
  • 资助金额:
    $ 23.01万
  • 项目类别:
Resource Curation and Evaluation for EHR Note Comprehension
EHR 笔记理解的资源管理和评估
  • 批准号:
    9794757
  • 财政年份:
    2018
  • 资助金额:
    $ 23.01万
  • 项目类别:
Systems for Helping Veterans Comprehend Electronic Health Record Notes
帮助退伍军人理解电子健康记录笔记的系统
  • 批准号:
    9768225
  • 财政年份:
    2015
  • 资助金额:
    $ 23.01万
  • 项目类别:
Systems for Helping Veterans Comprehend Electronic Health Record Notes
帮助退伍军人理解电子健康记录笔记的系统
  • 批准号:
    9894743
  • 财政年份:
    2015
  • 资助金额:
    $ 23.01万
  • 项目类别:
EHR Anticoagulants Pharmacovigilance
EHR 抗凝剂药物警戒
  • 批准号:
    9190384
  • 财政年份:
    2014
  • 资助金额:
    $ 23.01万
  • 项目类别:

相似海外基金

ADVANCED DEVELOPMENT OF LQ A LIPOSOME-BASED SAPONIN-CONTAINING ADJUVANT FOR USE IN PANSARBECOVIRUS VACCINES
用于 Pansarbecovirus 疫苗的 LQ A 脂质体含皂苷佐剂的先进开发
  • 批准号:
    10935820
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
ADVANCED DEVELOPMENT OF BBT-059 AS A RADIATION MEDICAL COUNTERMEASURE FOR DOSING UP TO 48H POST EXPOSURE"
BBT-059 的先进开发,作为辐射医学对策,可在暴露后 48 小时内进行给药”
  • 批准号:
    10932514
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
Advanced Development of a Combined Shigella-ETEC Vaccine
志贺氏菌-ETEC 联合疫苗的先进开发
  • 批准号:
    10704845
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
Advanced development of composite gene delivery and CAR engineering systems
复合基因递送和CAR工程系统的先进开发
  • 批准号:
    10709085
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
Advanced Development of Gemini-DHAP
Gemini-DHAP的高级开发
  • 批准号:
    10760050
  • 财政年份:
    2023
  • 资助金额:
    $ 23.01万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了