Semi-structured Information Retrieval in Clinical Text for Cohort Identification
用于队列识别的临床文本中的半结构化信息检索
基本信息
- 批准号:10879792
- 负责人:
- 金额:$ 63.75万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-20 至 2026-04-30
- 项目状态:未结题
- 来源:
- 关键词:AccelerationAdoptedAdoptionAlgorithmsArchitectureCOVID-19ClinicClinicalClinical DataClinical ResearchClinical TrialsCohort StudiesCollectionCommunitiesComplexDataDevelopmentElectronic Health RecordEligibility DeterminationEvaluationFeedbackFormulationGoalsHealthcareHealthcare SystemsHumanInformaticsInformation RetrievalInstitutionLanguageLearningMedicalMethodsModelingMorphologic artifactsNatural Language ProcessingOutcomePatient RecruitmentsPatientsPerformancePharmaceutical PreparationsProcessResearchResourcesRetrievalSemanticsSiteStructureSystemTechniquesTextTimeTrainingTranslational ResearchVisitWorkclinical data warehouseclinical research siteclinical trial recruitmentcohortdata modelingdata standardsdensitydesignexperimental studyhealth care deliveryheterogenous dataindexinginnovationlearning strategyneuralnovelopen sourceportabilitypredictive modelingquery toolsrecruitsearch enginestructured datatool
项目摘要
Project Summary
The widespread adoption of Electronic Health Records (EHRs) has enabled the use of clinical data for clinical
research and healthcare delivery. Many institutions have established clinical data warehouses (CDWs) in
conjunction with cohort discovery tools (e.g., i2b2) to support the use of clinical data for clinical research
including retrospective clinical studies as well as feasibility assessment or patient recruitment for clinical trials.
However, a significant portion of relevant patient information is embedded in clinical narratives and natural
language processing (NLP) techniques such as information extraction are critical when using EHR data for
clinical research. Many clinical NLP systems have been developed to extract information from text for various
downstream applications but have had unsatisfactory performance and portability issues. Information retrieval
(IR), a technique used in search engines for storing, retrieving, and ranking documents from a large collection of
text documents based on users’ queries, can provide an alternative approach to leverage clinical narratives for
cohort discovery as it is less dependent on semantics. In order to accomplish this, additional work is needed
since current IR approaches are generally document-based and the formulation of cohort discovery as an IR
task requires the development of innovative IR approaches to handle complex EHR data and cohort criteria with
contextual (e.g., spatial or temporal) constraints.
Our long-term goal is to develop informatics solutions to accelerate the use of EHR data for clinical research.
The main goal of this proposal is to develop innovative IR methods, which formulate cohort discovery from EHR
data as an IR task, aiming to accelerate the identification of patient cohorts for cohort studies or the recruitment
of eligible patients for clinical trials. In our current R01-supported study (R01LM011934), we introduced novel
language models to enable the reuse of NLP-produced artifacts for IR-based cohort retrieval and developed
parallel resources for IR evaluation at two institutions (Mayo Clinic and OHSU). We hypothesize that, given
complex cohort criteria with contextual constraints, an IR framework with tailored architecture components (e.g.,
indexing, ranking, evaluation, and query processing) for storing and querying EHR data has an advantage over
traditional cohort discovery tools for querying unstructured EHR data as well as an advantage over text-based
search engines for querying both structured and unstructured EHR data. For the proposed renewal, we plan to
i) adopt common data models (CDMs) and deploy the framework at one additional site to assess the
generalizability of methods, ii) extend the IR framework to incorporate contextual information, and iii)
incorporate deep semantic representations into the IR framework. If successful, the proposed project will
advance informatics research on cohort discovery and identification, which impacts many applications based on
EHR data such as learning healthcare systems, predictive modeling, or AI in healthcare.
项目摘要
电子健康记录的宽度采用(EHR)已使临床数据用于临床
研究和医疗保健提供。许多机构已经建立了临床数据仓库(CDW)
与队列发现工具的联系(例如,I2B2)支持使用临床数据的临床研究
包括回顾性临床研究以及可行性评估或患者招募临床试验。
但是,相关患者信息的很大一部分嵌入到临床叙述和自然中
使用EHR数据时,语言处理(NLP)技术(例如信息提取)至关重要
临床研究。已经开发了许多临床NLP系统来从文本中提取信息
下游应用程序,但性能和可移植性问题不令人满意。信息检索
(ir),一种用于搜索引擎中用于存储,检索和排名文档的技术
基于用户查询的文本文档可以提供另一种方法来利用临床叙述
队列发现,因为它不太依赖语义。为了实现这一目标,需要其他工作
由于当前的IR方法通常是基于文档的,并且同类发现的公式作为IR
任务需要开发创新的IR方法,以处理复杂的EHR数据和队列标准
上下文(例如,空间或临时)约束。
我们的长期目标是开发信息解决方案,以加速使用EHR数据进行临床研究。
该提案的主要目标是开发创新的IR方法,该方法从EHR中提出了同类发现
数据作为IR任务,旨在加速对队列研究或招聘的患者队列的识别
合格的患者进行临床试验。在当前的R01支持研究(R01LM011934)中,我们引入了新颖
语言模型能够重新使用基于IR的同类群体检索NLP生产的工件
在两个机构(Mayo Clinic和OHSU)进行IR评估的并行资源。我们假设这是给定的
具有上下文约束的复杂队列标准,这是一个带有量身定制的体系结构组件的IR框架(例如,
存储和查询EHR数据的索引,排名,评估和查询处理)优于
传统的队列发现工具,用于查询非结构化EHR数据以及优于基于文本的优势
搜索引擎查询结构化和非结构化EHR数据。对于拟议的续约,我们计划
i)采用常见数据模型(CDM)并将框架部署在另一个站点中以评估
方法的概括性,ii)扩展IR框架以合并上下文信息,以及III)
将深层语义表示形式纳入IR框架。如果成功,则建议的项目将
有关队列发现和身份证明的提前信息研究,这影响了许多基于
EHR数据,例如学习医疗系统,预测性建模或医疗保健中的AI。
项目成果
期刊论文数量(34)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MedTator: a serverless annotation tool for corpus development.
MedTator:用于语料库开发的无服务器注释工具。
- DOI:10.1093/bioinformatics/btab880
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:He,Huan;Fu,Sunyang;Wang,Liwei;Liu,Sijia;Wen,Andrew;Liu,Hongfang
- 通讯作者:Liu,Hongfang
The IMPACT framework and implementation for accessible in silico clinical phenotyping in the digital era.
- DOI:10.1038/s41746-023-00878-9
- 发表时间:2023-07-21
- 期刊:
- 影响因子:15.2
- 作者:
- 通讯作者:
Clinical concept extraction: A methodology review.
- DOI:10.1016/j.jbi.2020.103526
- 发表时间:2020-09
- 期刊:
- 影响因子:4.5
- 作者:Fu S;Chen D;He H;Liu S;Moon S;Peterson KJ;Shen F;Wang L;Wang Y;Wen A;Zhao Y;Sohn S;Liu H
- 通讯作者:Liu H
Extracting chemical-protein relations using attention-based neural networks.
- DOI:10.1093/database/bay102
- 发表时间:2018-01-01
- 期刊:
- 影响因子:0
- 作者:Liu S;Shen F;Komandur Elayavilli R;Wang Y;Rastegar-Mojarad M;Chaudhary V;Liu H
- 通讯作者:Liu H
Contextual Variation of Clinical Notes induced by EHR Migration.
EHR 迁移引起的临床记录的上下文变化。
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Miller,Kurt;Moon,Sungrim;Fu,Sunyang;Liu,Hongfang
- 通讯作者:Liu,Hongfang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
WILLIAM R HERSH其他文献
WILLIAM R HERSH的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('WILLIAM R HERSH', 18)}}的其他基金
Attracting Talented and Diverse Students to Biomedical Informatics and Data Science Careers Through Short-Term Study at OHSU
通过在 OHSU 的短期学习吸引才华横溢、多元化的学生从事生物医学信息学和数据科学职业
- 批准号:
10630618 - 财政年份:2022
- 资助金额:
$ 63.75万 - 项目类别:
Attracting Talented and Diverse Students to Biomedical Informatics and Data Science Careers Through Short-Term Study at OHSU
通过在 OHSU 的短期学习吸引才华横溢、多元化的学生从事生物医学信息学和数据科学职业
- 批准号:
10701083 - 财政年份:2022
- 资助金额:
$ 63.75万 - 项目类别:
Computational Omics and Biomedical Informatics Program (COBIP)
计算组学和生物医学信息学计划(COBIP)
- 批准号:
10319196 - 财政年份:2021
- 资助金额:
$ 63.75万 - 项目类别:
Computational Omics and Biomedical Informatics Program (COBIP)
计算组学和生物医学信息学计划(COBIP)
- 批准号:
10490403 - 财政年份:2021
- 资助金额:
$ 63.75万 - 项目类别:
Computational Omics and Biomedical Informatics Program (COBIP)
计算组学和生物医学信息学计划(COBIP)
- 批准号:
10676322 - 财政年份:2021
- 资助金额:
$ 63.75万 - 项目类别:
Research Training in Biomedical Informatics and Data Science at Oregon Health & Science University
俄勒冈健康中心生物医学信息学和数据科学研究培训
- 批准号:
9524502 - 财政年份:2017
- 资助金额:
$ 63.75万 - 项目类别:
Biomedical Informatics Research Training at Oregon Health & Science University
俄勒冈健康中心的生物医学信息学研究培训
- 批准号:
9369268 - 财政年份:2016
- 资助金额:
$ 63.75万 - 项目类别:
Semi-structured Information Retrieval in Clinical Text for Cohort Identification
用于队列识别的临床文本中的半结构化信息检索
- 批准号:
10450805 - 财政年份:2014
- 资助金额:
$ 63.75万 - 项目类别:
Semi-structured Information Retrieval in Clinical Text for Cohort Identification
用于队列识别的临床文本中的半结构化信息检索
- 批准号:
10207950 - 财政年份:2014
- 资助金额:
$ 63.75万 - 项目类别:
OHSU Summer Internship in Biomedical Informatics for College Undergraduates
OHSU 大学本科生生物医学信息学暑期实习
- 批准号:
8281433 - 财政年份:2011
- 资助金额:
$ 63.75万 - 项目类别:
相似国自然基金
采用新型视觉-电刺激配对范式长期、特异性改变成年期动物视觉系统功能可塑性
- 批准号:32371047
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
破解老年人数字鸿沟:老年人采用数字技术的决策过程、客观障碍和应对策略
- 批准号:72303205
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
通过抑制流体运动和采用双能谱方法来改进烧蚀速率测量的研究
- 批准号:12305261
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
采用多种稀疏自注意力机制的Transformer隧道衬砌裂缝检测方法研究
- 批准号:62301339
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
政策激励、信息传递与农户屋顶光伏技术采用提升机制研究
- 批准号:72304103
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
The University of Miami AIDS Research Center on Mental Health and HIV/AIDS - Center for HIV & Research in Mental Health (CHARM)Research Core - EIS
迈阿密大学艾滋病心理健康和艾滋病毒/艾滋病研究中心 - Center for HIV
- 批准号:
10686546 - 财政年份:2023
- 资助金额:
$ 63.75万 - 项目类别:
Extensible Open Source Zero-Footprint Web Viewer for Cancer Imaging Research
用于癌症成像研究的可扩展开源零足迹 Web 查看器
- 批准号:
10644112 - 财政年份:2023
- 资助金额:
$ 63.75万 - 项目类别:
Accelerating genomic analysis for time critical clinical applications
加速时间紧迫的临床应用的基因组分析
- 批准号:
10593480 - 财政年份:2023
- 资助金额:
$ 63.75万 - 项目类别:
Augmenting Pharmacogenetics with Multi-Omics Data and Techniques to Predict Adverse Drug Reactions to NSAIDs
利用多组学数据和技术增强药物遗传学,预测 NSAID 的药物不良反应
- 批准号:
10748642 - 财政年份:2023
- 资助金额:
$ 63.75万 - 项目类别: