Integrating image and text information for biomedical information retrieval

整合图像和文本信息进行生物医学信息检索

基本信息

  • 批准号:
    8943231
  • 负责人:
  • 金额:
    $ 57.25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

The search for relevant and actionable information is key to achieving clinical and research goals in biomedicine. Biomedical information exists in different forms: as text and illustrations in journal articles and other documents, in images stored in databases, and as patients cases in electronic health records. In the context of this work, an image includes not only biomedical images, but also illustrations, charts, graphs, and other visual material appearing in biomedical journals, electronic health records, and other relevant databases. The project objectives may be formulated as seeking better ways to retrieve information from these entities, by moving beyond conventional text-based searching to combining both text and visual features in search queries. The approaches to meeting these objectives use a combination of techniques and tools from the fields of Information Retrieval (IR), Content-Based Image Retrieval (CBIR), and Natural Language Processing (NLP). The first objective is to improve the retrieval of biomedical literature by targeting the visual content in articles, a rich source of information not typically exploited by conventional bibliographic or full-text databases. We index these figures (including illustrations and images) using (i) text in captions and where they are mentioned in the body of the article (mentions), (ii) image features, such as color, shape, size, etc., and, if available, (iii) annotation markers within figures such as arrows, letters or symbols that are extracted from the image and correlated with concepts in the caption. These annotation markers can help isolate regions of interest (ROI) in images, the ROI being useful for improving the relevance of the figures retrieved. It is hypothesized that augmenting conventional search results with relevant images offers a richer search. Taking the retrieval of biomedical literature a step further, within the first objective our goal is to find information relevant to a patient's case from the literature and then link it to the patients health record. The case is first represented in structured form using both text and image features, and then literature and EHR databases are searched for similar cases. A second objective is to find semantically similar images in image databases, an important step in differential diagnosis. We explore approaches that automatically combine image and text features in contrast to visual decision support systems (for example, VisualDx) that use only text driven menus. Such menu driven systems guide a physician to describe a patient and then present a set of images from which a clinician can select the ones most similar to the patients, and access relevant information manually linked to the images. Our methods use text and image features extracted from relevant components in a document, database, or case description to achieve our objectives. For the document retrieval task, we rely on the U.S. National Library of Medicine (NLM) developed search engine. This is a phrase-based search engine with NLMs Unified Medical Language System (UMLS) based term and concept query expansion and probabilistic relevancy ranking that exploits document structure. Optimizing these features, we create structured representations of every full-text document and all its figures. These structured documents presented to the user as search results include typical fields found in MEDLINE citations (e.g., titles, abstracts and MeSH terms), the figures in the original documents, and image-specific fields extracted from the original documents (such as captions segmented into parts pertaining to each pane in a multi-panel image, ROI described in each caption, and modality of the image). In addition, patient-oriented outcomes extracted from the abstracts are provided to the user. Automatic image annotation and retrieval objectives are achieved in the following ways: (i) using image analysis; (ii) by indexing the text assigned to images; and (iii) using a combination of image and text analysis. Additional steps include describing an image with visual features, automatically detecting its modality (for example, CT, MR, X-ray, ultrasound, etc.), and generating a visual ontology, i.e., concepts assigned to image patches. Elements from the visual ontology are called visual keywords and are used to find images with similar concepts. To evaluate and demonstrate our techniques, we have developed OpenI (pronounced open eye, available at http://openi.nlm.nih.gov), a hybrid system combining text-based searching with an image similarity engine. OpenI is a novel system that enables users to search for and retrieve citations that are enriched with relevant images and bottom line (or take away) statements extracted from a collection of approximately 733,000 open access articles and nearly 2.3 million illustrations from the biomedical literature hosted at the National Library of Medicine's PubMed Central repository and over 8,000 radiology images and 4,000 radiology examination reports from the Indiana University collection of chest x-rays. Each enriched citation is linked to PubMed Central, PubMed, MedlinePlus as well as to the article itself at the publishers Web site. A user may search by text words, as well as by query images. Using this framework we explore alternative approaches to the problem of searching for information using a combination of visual and text features: (i) starting a text-based search of an image database, and refining the search using image features; (ii) starting a visual search using the (clinical) image of a given patient, and then linking the image to relevant information found by using visual and text features; (iii) starting a multimodal search that combines text and image features. OpenI indexes all the text and illustrations in medical articles by features, both textual and image-based. OpenI also indexes a collection of 8000 digital chest x-rays and accompanying radiology reports with an aim to provide easy access to publicly available and deidentified patient records. To compute text and image features efficiently, the system is built on a high performance distributed computing platform. As the first production-quality system of its kind in the biomedical domain, OpenI has enabled medical professionals and the public access to visual information from biomedical articles that are highly relevant to their query, as well as the take away messages of the articles. The quality of the information delivered by OpenI has been evaluated in international competitions, in which the system consistently ranks among the best. For example, the system placed first in 2013 image retrieval evaluation that attracted participants from academia, industry and clinical settings. For the past year the site has grown to attract over 41,000 unique visitors (including bots) with 690,000 hits daily and is able to support searches of vast multimedia collections.
寻找相关和可行的信息是实现生物医学临床和研究目标的关键。生物医学信息以不同的形式存在:作为期刊文章和其他文档中的文本和插图,存储在数据库中的图像以及电子健康记录中的患者中。在这项工作的背景下,图像不仅包括生物医学图像,还包括生物医学期刊,电子健康记录和其他相关数据库中出现的插图,图表,图形和其他视觉材料。通过超越传统的基于文本的搜索,可以将项目目标从这些实体中检索信息,以寻求更好的方法来检索这些实体的信息。实现这些目标的方法结合了信息检索(IR),基于内容的图像检索(CBIR)和自然语言处理(NLP)的技术和工具。 第一个目的是通过针对文章中的视觉内容来改善生物医学文献的检索,这是常规书目或全文数据库通常不利用的丰富信息来源。我们使用(i)字幕中的文本以及文章(提及),(ii)图像特征(例如颜色,形状,大小等)中提到的这些数字(包括插图和图像)索引这些数字,以及(iii)的图像特征,(iii)注释标记,例如箭头,字母或符号,这些图形,字母或符号从图像和构想中提取的字母或符号。这些注释标记可以帮助孤立图像中感兴趣的区域(ROI),ROI对于改善检索到的数字的相关性很有用。假设使用相关图像增强传统搜索结果提供了更丰富的搜索。 将生物医学文献的检索取回进一步,在第一个目标中,我们的目标是从文献中找到与患者病例相关的信息,然后将其链接到患者的健康记录。该案例首先使用文本和图像特征以结构化形式表示,然后将文献和EHR数据库搜索相似的情况。 第二个目标是在图像数据库中找到语义上相似的图像,这是鉴别诊断的重要一步。我们探讨了与仅使用文本驱动菜单的视觉决策支持系统(例如VisualDx)相比,自动组合图像和文本功能的方法。这种菜单驱动的系统指导医生描述患者,然后介绍一组临床医生可以从中选择与患者最相似的图像,并手动访问与图像的相关信息。 我们的方法使用文本和图像功能从文档,数据库或案例描述中相关组件提取的文本和图像功能来实现我们的目标。对于文件检索任务,我们依靠美国国家医学图书馆(NLM)开发的搜索引擎。这是一个基于短语的搜索引擎,具有NLMS统一医学语言系统(UMLS)的术语和概念查询扩展以及利用文档结构的概率相关性排名。为了优化这些功能,我们创建了每个全文文档及其所有数字的结构化表示。这些结构化的文档作为搜索结果包括在Medline引用中发现的典型字段(例如,标题,摘要和网格术语),原始文档中的数字以及从原始文档中提取的特定图像特定字段(例如,在每个窗格中分为每个窗格中的captions caption caption caption和mod atie caption和mod of Modality caption captions tos captions captions captions captions captions captions captions captions caption tos caption tos caption caption tos caption s。此外,从摘要中提取的面向患者的结果还提供给用户。 自动图像注释和检索目标是通过以下方式实现的:(i)使用图像分析; (ii)用索引分配给图像的文本; (iii)结合图像和文本分析。其他步骤包括描述具有视觉特征的图像,自动检测其模态(例如,CT,MR,X射线,超声等),并生成视觉本体论,即分配给图像补丁的概念。视觉本体学的元素称为视觉关键字,用于查找具有相似概念的图像。 为了评估和演示我们的技术,我们开发了Openi(发音为睁大眼镜,可在http://openi.nlm.nih.gov上获得),这是一种将基于文本的搜索与图像相似性引擎相结合的混合系统。 Openi是一种新型系统,它使用户能够从相关图像和底线(或带走)陈述中搜索和检索引用,这些陈述是从大约733,000篇开放访问文章的收藏中提取的,并从国家图书馆的PubMed Repository托管的生物医学文献中托管的近230万个插图,以及8,000个放射学图像和4,000次放射学院的估算,这些文献均来自国家图书馆,并收集了4,000次放射学院。每个丰富的引文都与PubMed Central,PubMed,Medlineplus以及出版商网站上的文章有关。用户可以通过文本单词以及查询图像进行搜索。使用此框架,我们探讨了使用视觉和文本功能组合搜索信息问题的替代方法:(i)启动基于文本的图像数据库搜索,并使用图像功能来完善搜索; (ii)使用给定患者的(临床)图像开始视觉搜索,然后将图像链接到使用视觉和文本特征发现的相关信息; (iii)启动组合文本和图像特征的多模式搜索。 Openi按照文本和基于图像的功能在医学文章中索引所有文本和插图。 Openi还索引了8000个数字X射线和随附的放射学报告的集合,目的是轻松访问公开可获得的和被识别的患者记录。为了有效地计算文本和图像功能,系统是在高性能分布式计算平台上构建的。作为生物医学领域中同类产品的第一个生产质量系统,Openi使医疗专业人员和公众从与其查询高度相关的生物医学文章以及文章的删除信息中获得了视觉信息。 OpenI提供的信息质量已在国际比赛中进行了评估,在国际比赛中,该系统始终排名最佳。例如,该系统在2013年的图像检索评估中排名第一,吸引了学术界,工业和临床环境的参与者。在过去的一年中,该网站已增长,吸引了41,000多名唯一访客(包括机器人),每天有690,000次命中,并能够支持大量多媒体收藏的搜索。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dina Demner-Fushman其他文献

Dina Demner-Fushman的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Dina Demner-Fushman', 18)}}的其他基金

Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
  • 批准号:
    8344956
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
  • 批准号:
    9160922
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Providing timely and effortless access to reliable health-related information for decision support and education
及时、轻松地获取可靠的健康相关信息,以提供决策支持和教育
  • 批准号:
    10927040
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
  • 批准号:
    8158052
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
  • 批准号:
    9359855
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
  • 批准号:
    8558113
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
  • 批准号:
    10269684
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Semantic indexing of biomedical publications and clinical text
生物医学出版物和临床文本的语义索引
  • 批准号:
    10269685
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Natural language processing for precision medicine and clinical and consumer health question
用于精准医学以及临床和消费者健康问题的自然语言处理
  • 批准号:
    10269687
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:
Natural language processing for precision medicine and clinical and consumer health question
用于精准医学以及临床和消费者健康问题的自然语言处理
  • 批准号:
    9554461
  • 财政年份:
  • 资助金额:
    $ 57.25万
  • 项目类别:

相似国自然基金

基于人工智能构建心脏功能亚段实现胸部肿瘤个体化放疗新策略
  • 批准号:
    12305394
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于多源知识融合与多层级跨模态对齐的胸部影像诊断模型
  • 批准号:
    62361027
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
基于连续黄金角径向采样的胸部磁共振快速成像算法研究
  • 批准号:
    62301352
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于胸部CT和超极化129Xe-MRI对COVID-19康复患者肺结构和功能的随访研究
  • 批准号:
    82272109
  • 批准年份:
    2022
  • 资助金额:
    53 万元
  • 项目类别:
    面上项目
基于深度学习算法分析胸部CT构建原发肺癌脑转移风险模型
  • 批准号:
    32270688
  • 批准年份:
    2022
  • 资助金额:
    54.00 万元
  • 项目类别:
    面上项目

相似海外基金

Computational Toolkit for Normalizing the Impact of CT Acquisition and Reconstruction on Quantitative Image Features
用于标准化 CT 采集和重建对定量图像特征影响的计算工具包
  • 批准号:
    10530062
  • 财政年份:
    2022
  • 资助金额:
    $ 57.25万
  • 项目类别:
Eliminating Ischemic Spinal Cord Injury and Paralysis after Aortic Aneurysm Surgery
消除主动脉瘤手术后的缺血性脊髓损伤和瘫痪
  • 批准号:
    10469194
  • 财政年份:
    2022
  • 资助金额:
    $ 57.25万
  • 项目类别:
FAIR-CT: a practical approach to enable ultra-low dose CT for longitudinal disease and treatment monitoring
FAIR-CT:一种利用超低剂量 CT 进行纵向疾病和治疗监测的实用方法
  • 批准号:
    10158473
  • 财政年份:
    2020
  • 资助金额:
    $ 57.25万
  • 项目类别:
Reinforcing old warriors to treat Mycobacterium kansasii in shorter duration
强化老战士在更短的时间内治疗堪萨斯分枝杆菌
  • 批准号:
    10250999
  • 财政年份:
    2020
  • 资助金额:
    $ 57.25万
  • 项目类别:
HEAL Diversity Supplement: Great Lakes Nodes Clinical Trials Network
HEAL 多样性补充:五大湖节点临床试验网络
  • 批准号:
    10354615
  • 财政年份:
    2019
  • 资助金额:
    $ 57.25万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了