Constructing a large-scale biomedical knowledge graph using all PubMed abstracts and PMC full-text articles and its applications
利用所有PubMed摘要和PMC全文文章构建大规模生物医学知识图谱及其应用
基本信息
- 批准号:10648553
- 负责人:
- 金额:$ 14.32万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-01 至 2025-08-31
- 项目状态:未结题
- 来源:
- 关键词:AccelerationAddressAdverse effectsAreaAwardBiologicalCOVID-19Cell LineCreativenessDataData SetDepositionDevelopmentDiseaseHomo sapiensInformation RetrievalKnowledgeKnowledge DiscoveryLinkLiteratureManualsMethodologyMethodsModelingModernizationNamesNational Center for Advancing Translational SciencesNatural Language ProcessingOrganPatientsProbabilityProcessPubMedPublicationsPublishingReadingResearchResearch PersonnelResourcesRewardsScienceSpeedStructureSystemTextTimeTissuesUnited States National Aeronautics and Space AdministrationUnited States National Institutes of HealthVisualizationVisualization softwarebiomedical scientistcell typedeep learningdeep learning modeldrug repurposinggraph knowledge baseimprovedinformation frameworkinnovationinterestknowledge graphknowledge integrationlearning strategynovelremdesivirsearch enginetext searchingtooltrendwasting
项目摘要
Project Summary
The number of biomedical publications is growing at an accelerated speed. This ever-increasing
amount of scientific literature has made reading all the published articles regularly impossible
even for a very specific research area. The large volumes of scientific publication have also
made it very challenging for modern search engines to find relevant articles accurately for a
given query. Missing important prior studies in literature search can have serious consequences
such as wasting resources/time and/or making wrong scientific conclusions. Another unmet
challenge in literature search is that researchers often prefer finding articles where the queries
they use are part of the new discoveries, instead of the background knowledge in the articles.
The current search engines cannot distinguish between new discoveries and background
knowledge in an article. Related to this challenge is that it can be difficult to identify the latest
discoveries in a particular scientific area without reading all the recently published articles. To
address these challenges, one can convert unstructured text data into structured form, which can
then support highly accurate information retrieval, information integration and automated
knowledge discovery. A plausible approach for converting unstructured text into structured form
is to use named entity recognition (NER) and relation extraction (RE) methods to identify the
biological entities and extract their relations to construct knowledge graphs (KGs). KGs can link
concepts within existing research to allow researchers to find connections that may have been
difficult to discover without them. The LitCoin Natural Language Processing (NLP) Challenge
was recently organized by NCATS of NIH and NASA to spur innovation by rewarding the most
creative and high-impact uses of biomedical, publication-free text to create KGs. In addition to
entities and relations, the manually annotated dataset provided by LitCoin also contains the
annotations of relations being new discoveries or background knowledge. Our team has
participated in the challenge and ranked the first place. This application aims to apply the
methods we have developed for LitCoin to all PubMed abstracts and PMC full-text articles to
build the largest scale KG to date and develop applications on top of it. Specifically, we will (1)
develop a knowledge visualization and navigation tool combined with a deep learning-powered
search engine we developed previously; (2) develop advanced relation search functions to allow
knowledge discovery applications such as drug repurposing and adverse effect discovery; (3)
develop functions that allow users to search specifically the new discoveries in articles; and (4)
develop functions that return the latest discoveries in a scientific area for a given time period.
项目摘要
生物医学出版物的数量以加速的速度增长。这个不断增加的
科学文献的数量使阅读所有已发表的文章经常不可能
即使对于一个非常具体的研究领域。大量科学出版物也已经
对于现代搜索引擎来说,准确地找到相关文章的问题非常具有挑战性
给定查询。缺少重要的文献搜索研究可能会带来严重的后果
例如浪费资源/时间和/或做出错误的科学结论。另一个尚未满足
文学搜索中的挑战是,研究人员通常更喜欢找到查询的文章
他们使用的是新发现的一部分,而不是文章中的背景知识。
当前的搜索引擎无法区分新发现和背景
文章中的知识。与这一挑战有关的是,很难识别最新
在特定科学领域的发现,没有阅读所有最近发表的文章。到
应对这些挑战,可以将非结构化的文本数据转换为结构化形式,可以
然后支持高度准确的信息检索,信息集成和自动化
知识发现。将非结构化文本转换为结构化形式的合理方法
使用指定的实体识别(NER)和关系提取(RE)方法来识别
生物实体并提取其关系以构建知识图(kgs)。公里可以链接
现有研究中的概念,使研究人员能够找到可能已经存在的联系
没有他们很难发现。 Litcoin自然语言处理(NLP)挑战
最近是由NIH和NASA的NCAT组织的,通过奖励最大的奖励来刺激创新
生物医学无出版物文本创建kgs的创造性和高影响力的用途。此外
实体和关系,Litcoin提供的手动注释数据集还包含
关系的注释是新发现或背景知识。我们的团队有
参加了挑战,并排名第一。该应用程序旨在应用
我们为所有PubMed摘要和PMC全文文章开发了Litcoin的方法
建立迄今为止最大的规模公斤,并在其中开发应用程序。具体来说,我们将(1)
开发知识可视化和导航工具与深度学习驱动
我们以前开发的搜索引擎; (2)开发高级关系搜索功能以允许
知识发现应用,例如重新利用和不良效应发现; (3)
开发功能,使用户可以专门搜索文章中的新发现; (4)
开发功能可以在特定时间段内返回科学领域的最新发现。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jinfeng Zhang其他文献
Jinfeng Zhang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jinfeng Zhang', 18)}}的其他基金
Collaborative Research: Mathematical Framework for Biomolecules: From Protein to RNA to Chromosomes
合作研究:生物分子的数学框架:从蛋白质到RNA到染色体
- 批准号:
10189648 - 财政年份:2017
- 资助金额:
$ 14.32万 - 项目类别:
Elastic Shape Analysis for Protein Structure Alignment-New Advancement in an Old
蛋白质结构排列的弹性形状分析——旧方法的新进展
- 批准号:
8284583 - 财政年份:2012
- 资助金额:
$ 14.32万 - 项目类别:
Elastic Shape Analysis for Protein Structure Alignment-New Advancement in an Old
蛋白质结构排列的弹性形状分析——旧方法的新进展
- 批准号:
8486453 - 财政年份:2012
- 资助金额:
$ 14.32万 - 项目类别:
相似国自然基金
时空序列驱动的神经形态视觉目标识别算法研究
- 批准号:61906126
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
本体驱动的地址数据空间语义建模与地址匹配方法
- 批准号:41901325
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大容量固态硬盘地址映射表优化设计与访存优化研究
- 批准号:61802133
- 批准年份:2018
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
IP地址驱动的多径路由及流量传输控制研究
- 批准号:61872252
- 批准年份:2018
- 资助金额:64.0 万元
- 项目类别:面上项目
针对内存攻击对象的内存安全防御技术研究
- 批准号:61802432
- 批准年份:2018
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Targeting Alcohol-Opioid Co-Use Among Young Adults Using a Novel MHealth Intervention
使用新型 MHealth 干预措施针对年轻人中酒精与阿片类药物的同时使用
- 批准号:
10456380 - 财政年份:2023
- 资助金额:
$ 14.32万 - 项目类别:
Developing a novel disease-targeted anti-angiogenic therapy for CNV
开发针对 CNV 的新型疾病靶向抗血管生成疗法
- 批准号:
10726508 - 财政年份:2023
- 资助金额:
$ 14.32万 - 项目类别:
Switching Individuals in Treatment for Opioid Use Disorder Who Smoke Cigarettes to the SREC
将接受阿片类药物使用障碍治疗且吸烟的个体转至 SREC
- 批准号:
10661301 - 财政年份:2023
- 资助金额:
$ 14.32万 - 项目类别:
The contribution of air pollution to racial and ethnic disparities in Alzheimer’s disease and related dementias: An application of causal inference methods
空气污染对阿尔茨海默病和相关痴呆症的种族和民族差异的影响:因果推理方法的应用
- 批准号:
10642607 - 财政年份:2023
- 资助金额:
$ 14.32万 - 项目类别:
Augmenting Pharmacogenetics with Multi-Omics Data and Techniques to Predict Adverse Drug Reactions to NSAIDs
利用多组学数据和技术增强药物遗传学,预测 NSAID 的药物不良反应
- 批准号:
10748642 - 财政年份:2023
- 资助金额:
$ 14.32万 - 项目类别: