Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
基本信息
- 批准号:10269684
- 负责人:
- 金额:$ 62.13万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AnatomyBibliographyCaliforniaClinical ResearchCollectionColorDatabasesDecision Support SystemsDevicesDifferential DiagnosisElectronic Health RecordEvaluationEyeGoalsGraphHistory of MedicineHome environmentHybridsImageIndianaInformation RetrievalInternationalJournalsKnowledgeLinkLiteratureMEDLINEMeSH ThesaurusMedicalMedical LibrariesMedlinePlusMethodsMultimediaNatural Language ProcessingOrthopedicsOutcomePatientsPerformanceProductionPubMedRadiology SpecialtyRecordsReportingResearch SupportResourcesRetrievalSemanticsSiteSourceStructureSystemTechniquesTextTextureThoracic RadiographyUnified Medical Language SystemUnited States National Library of MedicineUniversitiesUpdateVisualWorkbasebioimagingclinical imagingclinically relevantcluster computingcomputational platformdeep learningdigitalimaging modalityimprovedindexingjournal articlemultimodalitypatient orientedphrasesradiological imagingradiologistrepositorysearch enginetoolvisual informationvisual searchweb site
项目摘要
The search for relevant and actionable information is key to achieving clinical and research goals in biomedicine. Biomedical information exists in different forms: as text and illustrations in journal articles and other documents, in images stored in databases, and as patients cases in electronic health records. In the context of this work, image refers not only to biomedical images, but also to illustrations, charts, graphs, and other visual material appearing in biomedical journals, electronic health records, and other relevant databases. We are developing better approaches to retrieve information from these entities, by moving beyond conventional text-based searching to combining both text and visual features in search queries. To meet these objectives, we use a combination of techniques and tools from the fields of Information Retrieval (IR), Content-Based Image Retrieval (CBIR), and Natural Language Processing (NLP).
The first objective is to improve the retrieval of biomedical literature by targeting the visual content in articles, a rich source of information not typically exploited by conventional bibliographic or full-text databases. We index these figures (including illustrations and images) using (i) text in captions and where they are mentioned in the body of the article (mentions). In FY2020, we established that recently available deep learning features help finding relevant images better than the traditional image features, such as color and texture. We have accordingly updated these features for image searches.
A second objective is to find semantically similar images in image databases, an important step in differential diagnosis. We explore approaches that automatically combine image and text features in contrast to visual decision support systems that use only text driven menus. To support this research, we maintain the MedPix database (https://medpix.nlm.nih.gov/home) that contains and continues accepting medical cases submitted by radiologists through the case upload server (https://cup.nlm.nih.gov/login).
Our methods use text and image features extracted from relevant components in a document, database, or case description to achieve our objectives. For the document retrieval task, we rely on the U.S. National Library of Medicine (NLM) developed search engine. This is a phrase-based search engine with NLMs Unified Medical Language System (UMLS) based term and concept query expansion and probabilistic relevancy ranking that exploits document structure. Optimizing these features, we create structured representations of every full-text document and all its figures. These structured documents presented to the user as search results include typical fields found in MEDLINE citations (e.g., titles, abstracts and MeSH terms), the figures in the original documents, and image-specific fields extracted from the original documents (such as captions segmented into parts pertaining to each pane in a multi-panel image, ROI described in each caption, and the modality of the image). In addition, patient-oriented outcomes extracted from the abstracts are provided to the user.
To evaluate and demonstrate our techniques, we have developed Open-i (pronounced open eye, available at http://openi.nlm.nih.gov), a hybrid system combining text-based searching with an image similarity engine. The Open-i system enables users to search for and retrieve citations that are enriched with relevant images and bottom line (or take away) statements extracted from the Open Access subset of the PubMed Central repository maintained by the National Library of Medicine (NLM); as well as over 8,000 radiology images and 4,000 radiology examination reports from the Indiana University collection of chest x-rays; 67,517 images from NLM History of Medicine collection; and about 2,064 orthopedic anatomy illustrations provided by Norris Medical Library, University of Southern California. Each enriched citation is linked to PubMed Central, PubMed, MedlinePlus as well as to the article itself at the publisher's Web site. A user may search by text words and by query images. Using this framework we explore alternative approaches to search for information using a combination of visual and text features: (i) starting a text-based search of an image database, and refining the search using image features; (ii) starting a visual search using a clinical image of a given patient, and then linking the image to relevant information found by using visual and text features; (iii) starting a multimodal search that combines text and image features. Open-i indexes all the text and illustrations in medical articles by features, both textual and image-based. Open-i also indexes a collection of 8000 digital chest x-rays and accompanying radiology reports with an aim to provide easy access to publicly available and de-identified patient records, as well as the orthopedic and historical images. To compute text and image features efficiently, the system is built on a high performance distributed computing platform. As the first and perhaps only production-quality system of its kind in the biomedical domain, Open-i has enabled medical professionals and the public to access visual information from biomedical articles that are highly relevant to their query, as well as the "take away" messages of the articles. The quality of the information delivered by Open-i has been evaluated in international competitions, in which the system consistently ranks among the best. For the past years the site has attracted over 10,000 unique visitors daily (excluding bots) with 690,000 hits daily and is able to support searches of vast multimedia collections. During the 2020 reporting period, the Open-i user interface was updated to provide equal quality of retrieval results for all types of devices used to access the site.
Using images from Open-i and MedPix, we have created several collections of clinically relevant question-answer pairs pertaining to images and used the collections in the biomedical VQA challenges, which we co-organized within the international ImageCLEF evaluations.
寻找相关和可行的信息是实现生物医学临床和研究目标的关键。生物医学信息以不同的形式存在:作为期刊文章和其他文档中的文本和插图,存储在数据库中的图像以及电子健康记录中的患者中。在这项工作的背景下,图像不仅是指生物医学图像,还指插图,图表,图形和其他视觉材料出现在生物医学期刊,电子健康记录和其他相关数据库中。我们正在开发更好的方法来从这些实体中检索信息,这是通过超越常规的基于文本的搜索来将文本和视觉特征组合在搜索查询中。为了满足这些目标,我们使用信息检索(IR),基于内容的图像检索(CBIR)和自然语言处理(NLP)的技术和工具的组合。
第一个目的是通过针对文章中的视觉内容来改善生物医学文献的检索,这是常规书目或全文数据库通常不利用的丰富信息来源。我们使用字幕中的(i)文本以及文章正文中提到的(提及)中提到的这些数字(包括插图和图像)为这些数字索引。在2020财年,我们确定了最近可用的深度学习功能可以帮助找到相关图像比传统图像功能(例如颜色和纹理)更好。因此,我们已经更新了这些功能以供图像搜索。
第二个目标是在图像数据库中找到语义上相似的图像,这是鉴别诊断的重要一步。我们探讨了与仅使用文本驱动菜单的视觉决策支持系统相比,自动组合图像和文本功能的方法。为了支持这项研究,我们维护MEDPIX数据库(https://medpix.nlm.nih.gov/home),该数据库包含并继续接受放射线医生通过案例上传服务器提交的医疗案例(https://cup.nlm.nlm.nih.gov/login)。
我们的方法使用文本和图像功能从文档,数据库或案例描述中相关组件提取的文本和图像功能来实现我们的目标。对于文件检索任务,我们依靠美国国家医学图书馆(NLM)开发的搜索引擎。这是一个基于短语的搜索引擎,具有NLMS统一医学语言系统(UMLS)的术语和概念查询扩展以及利用文档结构的概率相关性排名。为了优化这些功能,我们创建了每个全文文档及其所有数字的结构化表示。这些结构化的文档作为搜索结果包括在Medline引用中发现的典型字段(例如,标题,摘要和网格术语),原始文档中的数字以及从原始文档中提取的特定图像特定字段(例如,在每个窗格中分为captions caption caption和Mode the Mode partion和Mode the Mode partical and caption captions captions captions captions captions captions captions captions captions captions captions s段中的部分。此外,从摘要中提取的面向患者的结果还提供给用户。
为了评估和演示我们的技术,我们开发了Open-I(发音为Open Eye,可在http://openi.nlm.nih.gov上找到),这是一种将基于文本的搜索与图像相似性引擎相结合的混合系统。 Open-I系统使用户能够搜索和检索带有相关图像和底线(或取出的)语句的引用,这些语句从国家医学图书馆(NLM)维护的PubMed Central存储库的开放访问子集中提取;以及印第安纳大学胸部X射线收藏的8,000多个放射学图像和4,000次放射学检查报告; NLM医学史收集史的67,517张图像;南加州大学诺里斯医学图书馆提供了约2,064个骨科解剖插图。每个丰富的引文都与PubMed Central,PubMed,Medlineplus以及出版商网站上的文章本身有关。用户可以通过文字单词和查询图像进行搜索。使用此框架,我们探索了替代方法,使用视觉和文本功能的组合搜索信息:(i)启动基于文本的图像数据库搜索,并使用图像功能来完善搜索; (ii)使用给定患者的临床图像启动视觉搜索,然后将图像与使用视觉和文本功能找到的相关信息联系起来; (iii)启动组合文本和图像特征的多模式搜索。 OPEN-I按照基于文本和图像的功能在医学文章中索引所有文本和插图。 Open-I还索引了8000张数字X射线和随附的放射学报告的集合,目的是轻松访问公开可用和被识别的患者记录,以及骨科和历史图像。为了有效地计算文本和图像功能,系统是在高性能分布式计算平台上构建的。作为生物医学领域中同类产品的第一个也许是唯一的生产质量系统,Open-I使医疗专业人员和公众能够从与其查询高度相关的生物医学文章中访问视觉信息,以及这些文章的“取消”信息。 Open-I提供的信息质量已在国际比赛中进行了评估,在国际比赛中,该系统始终排名最高。在过去的几年中,该网站每天吸引了10,000多名唯一访问者(不包括机器人),每天有690,000次命中,并能够支持大量多媒体收藏的搜索。在2020年的报告期间,更新了Open-I用户界面,以为用于访问该站点的所有类型的设备提供相等的检索结果质量。
使用Open-I和Medpix的图像,我们创建了与图像有关的几个临床相关问题的收集,并使用了生物医学VQA挑战中的集合,我们在国际ImageClef评估中共同组织了这些挑战。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Evaluating the Importance of Image-related Text for Ad-hoc and Case-based Biomedical Article Retrieval.
评估图像相关文本对于临时和基于案例的生物医学文章检索的重要性。
- DOI:
- 发表时间:2010
- 期刊:
- 影响因子:0
- 作者:Simpson,MatthewS;Demner-Fushman,Dina;Thoma,GeorgeR
- 通讯作者:Thoma,GeorgeR
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Dina Demner-Fushman其他文献
Dina Demner-Fushman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Dina Demner-Fushman', 18)}}的其他基金
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
- 批准号:
8344956 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
- 批准号:
9160922 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Providing timely and effortless access to reliable health-related information for decision support and education
及时、轻松地获取可靠的健康相关信息,以提供决策支持和教育
- 批准号:
10927040 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
- 批准号:
8158052 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
- 批准号:
9359855 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
- 批准号:
8558113 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Natural language processing for precision medicine and clinical and consumer health question
用于精准医学以及临床和消费者健康问题的自然语言处理
- 批准号:
9554461 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Semantic indexing of biomedical publications and clinical text
生物医学出版物和临床文本的语义索引
- 批准号:
10269685 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Natural language processing for precision medicine and clinical and consumer health question
用于精准医学以及临床和消费者健康问题的自然语言处理
- 批准号:
10269687 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
Integrating image and text information for biomedical information retrieval
整合图像和文本信息进行生物医学信息检索
- 批准号:
8943231 - 财政年份:
- 资助金额:
$ 62.13万 - 项目类别:
相似海外基金
BLRD Research Career Scientist Award Application
BLRD 研究职业科学家奖申请
- 批准号:
10365153 - 财政年份:2021
- 资助金额:
$ 62.13万 - 项目类别:
BLRD Research Career Scientist Award Application
BLRD 研究职业科学家奖申请
- 批准号:
10512760 - 财政年份:2021
- 资助金额:
$ 62.13万 - 项目类别: